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ABSTRACT 

The purpose of this research was to perform computer 
analysis and modification of complex musical tones and to develop 
models of perceptual and learning processes in music. Analysis of the 
physical attributes of sound (frequency, intensity, and harmonic 
content, versus time) provided necessary information about the 
musical parameters of intonation, vibrato, dynamics, and rhythm. A 
general purpose digital computer and appropriate analog devices were 
utilized to analyze and synthesize complex musical tones. The 
procedures included the transformation of audio tapes of music to 
digital tapes via a high speed analog-to- digital converter system. 

The significance of the research is based on the belief that; 1) 
objectifying certain parameters of musical performance will have a 
direct bearing on behavioral goals and methods of music education; 
and, 2) an understanding of the total problem of human information 
processing requires a detailed investigation of structured non-verbal 
stimuli in the auditory mode. Two supplementary investigations are 
included; Computer Analysis of Musical Performance, by Warren C. 
Campbell (Appendix I) and The Effects of the Attack Transient on 
Aural Recognition of instrumental Timbres, by Ralph C. Thayer, Jr. 
(Appendix II) . Appendix III, Computer Analysis System Software, by 
Jack Owens, describes a library of computer programs used to perform 
certain basic mathematical analyses of the digitized musical 
performances. A related document is ED 058 745. ( Author/ JMB) 
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SUMMARY 



Investigations in speech have shown the complex nature of the 
processing which enables us to decode the acoustic signal of spoken 
language. Constraints on natural language are imposed by both the 
production and receiving mechanisms. The perception of nnisical 
sounds is obviously subject to some of the same constraints, since 
the same receptors and, possibly, some of the same processing 
mechanisms are used for music as for natural language. 

The ability to identify and classify significant perceptual 
parameters of musical performance, and to state the effects of these 
parameters in behavioral terms is a prerequisite to the establish- 
ment of effective training procedures and meaningful behavioral goals 
for music education programs. Knowledge of these parameters will 
enable objective models of musical performance to be developed and 
verified. 

The purpose of this research was to perform computer analysis 
and modification of complex musical tones and to develop models of 
perceptual and learning processes in music. Useful models of aural 
perception in music have been verified by comparing the responses of 
a computer implementation with the responses of appropriate human 
listeners. Analysis of the physical attributes of sound (frequency, 
intensity, and harmonic content, versus time) provided necessary 
information about the musical parameters of intonation, vibrato, 
dynamics, and rhjd:hm. Information has been provided regarding the 
relative importance of attack and steady-state portions of tones for 
specific instruments. Further. data provided information regarding 
brain hemisphere dominance for tasks of identifying instrument 
attacks when two different instruments are presented dichotically. 

In order to analyze and synthesize complex musical tones a 
general purpose digital computer and appropriate analog devices 
were utilized. The procedures included the transformation of audio 
tapes of music to digital tapes (numerical data) via a high speed 
analog-to-digital converter system. Multivariate statistical 
techniques were used to provide improved psychometric capability 
in the perceptual domain. 

The significance of this reseeirch is based on the belief the.t 
1) objectifying certain parameters of musical performance will have 
a direct bearing on behavioral goals aind methods in music education, 
and 2 ) an understanding of the total problem of human information 
processing requires a detailed investigation of structured non- 
verbal stimuli in the auditory mode. 
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I . Introduction 



The purpose of this research was to perform 
computer oriented analysis and synthesis of complex 
musical tones, and to develop models of perceptual 
and learning processes in music. This effort is 
motivated by the belief that the ability to identify 
and classify significant perceptual parameters of 
musical performance, and to state the effects of 
these parameters in behavioral terms is a pre- 
requisite to the establishment of effective training 
procedures and pertinent behavioral goals for 
public school and college music programs. Know- 
ledge of these parameters should enable objective 
models of musical performance to be developed and 
verified. This is, of course, not a new idea. One 
of the earliest proponents of this point of vievr 
is Carl E. Eeashore ( Psychology of I-usic , I'fcGrav/ 
Hill, 1938) . Additional impetus is given to this 
work by related experimentation in linguistics 
and psychology which point out that an understand- 
ing of the total problem of human information 
processing requires a detailed investigation of 
structured non-verbal stimuli in the auditory mode. 



Music Analysis 

The auditory characteristics of musical perform- 
ance cam be objectively examined in terms of the 
physical characteristics of the complex acoustic 
wave generated by the performer. For this research, 
the fact that there are visual aspects and other 
contingencies that may influence listener response 
to a performance is specifically not considered. 

This restricted situation is comparable to an 
individual listening to a tape or disc recording; 
the sound pattern can be completely represented 
at any of several points in the auditory channel 
as the variation of a single parameter plotted 
against time. Traditionally, if the vibration is 
reasonably repetitive, this information is reduced 
by introducing the concepts of frequency, intensity, 
harmonic content, and spectral distribution of 
noise, as functions of time. Their use allov7S for 
significant data reductions vrithout serious loss 
of information of importance in the analysis of 
music. 
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Every perceptually significant; nuance of the 
auditory component of musical performance, no 
matter iiov; complex, can be described as a function 
of these attributes of sound waves , This state- 
ment should not be read as rainimizing the extreme 
complexity of the subtleties and contextual depend- 
encies involved in artistic performance. 

Hany laymen and musicians believe that attempts 
to analyze artistic performance are futile. They 
contend that individual difference is an over- 
riding characteristic of artistic performance. The 
importance of individual difference is not denied, 
particularly vrhen considered as the hallmark that 
permits performer identification. However, there 
is\ a need to understand the constituent elements 
and limits of variability common to all artistic 
performance, as a prerequisite to any discussion of 
the effect on the listener of individual difference. 

Musicians probably v;ill agree, that before one 
can produce an artistic performance, there are 
certain technical demands of performance which must 
be satisfied. Control of these technical aspects, 
such as intonation, dynamics, tone quality, 
rhythm, and attack, is a prerequisite to performance 
at a professional level. 

An analogy can be drav/n betv;een the musician 
and the literary v^riter. The latter cannot 
create a literary masterpiece until he has learmed 
to control language. The v/riter v^ho achieves 
optimal creative production through v;ritten 
communication must first master the language and 
make it subject to his control and manipulation. 

It is the same in the creative production of music. 
The artists who achieve a high level of performance 
in music vxill emerge from^ that group of musicians 
v;ho have refined the technical aspects of performance 

This project was in no v;ay an attempt to 
mechanize the creative process or to place restraints 
upon those persons v;ho are capable of ma]:inc unique 
contributions to music. Rather, the attempt was 
to identify the perceptually significant aspects of 
music by analyzing model performances that exemplify 
a high level of musicianship (as judged by profession 
als) , and by comparing these to performances that 
fall short of this level. 

The basic analysis m.ay be thought of as 



a 



mapping of auditory characteristics from the percep- 
tual to the vibrational frames of reference, and 
the reverse. For each perceptual event, a vibration- 
al counterpart or correlate is sought v/hich can 
be described in terms of the parameters indicated 
be low : 



AUDITORY HAPPING 



Vibrational Frame 






Perceptual Frame 


(characteristics of 
the complex v/ave) 
frequency 




(subject response) 
pitch 


sound po\\rer 






loudness 


harmonic and noise 
content 


. 

S 




tone quality 


duration 




N 

\ 

*v 


rhythm, attack 



The relationships indicated by the solid lines 
between the vibrational and perceptual parameters 
are first-order effects . Functional relationships 
and difference limens were established for some of 
these main effects by the early e::perimenters 
in audiology. For example, the relationship betvreen 
sound power or intensity and loudness can be roughly 
represented by converting power level readings in 
watts to the decibel scale: 

loudness (db) = 10 logiAp 

*•09 



vThere P is the sound pov;er, and Pq is a reference 
level. J'n approximation to musical pitch can be 
obtained by converting frequency to a measure 
representing pitch in the tempered scale: 

tempered pitch (semitones) = 121og, f 

^ f 
^o 

where f_ is a reference pitch of zero, and the oct- 
ave above fQ (f = 2fQ> is represented as ”12'' 
on the sem.itone scale. The Scime procedures can 
be follot^ed for approximating musical pitch to 
other scales (Pythagorean, Just, etc.) . 
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These transformations are attempts to represent 
the perceptual domain in tei-ms of mappings from the 
vibrational domain. They are “first-order” 
approximations because they take into account the 
most important perceptual - vibrational links. Hov;- 
ever, the nuances of musical performance require 
that second- and perhaps third-order effects be 
represented in the mapping if prediction of the 
musically significant responses to a performance 
are to be achieved. 

A known second-order effect is the change in 
pitch introduced by an intensity change in a tone 
held at constant frequency. Second order effects 
can be categorized as the predictable cheunges in one 
perceptual dimension due to changes in the vibration- 
al correlate of another perceptual dimension. 

Seashore calls them the "normal illusions” of 
auditory perception. In terras of the diagram of 
auditory mapping, these effects would be indicated 
by the dashed lines connecting different levels 
(only those leading from "sound power" are shovm) . 
Uhile many of these effects have been investigated 
for pure tones and static situations, their 
importance emd application in a musical context 
have not, in most cases, been delineated. 



Development of Models 

Investigations in speech have sho^m the complex 
nature of the processing which enables us to 
decode the acoustic signal of spoken language. 
Constraints on natural language are imposed by 
both the production and receiving mechanisms. 

The perception of musical sounds is obviously 
subject to some of the same constraints, since the 
same receptors and, possibly, some of the same 
processing mechanisms are used for music as for 
natural language. 

Perceptual parameters such as pitch, duration 
and tone quality, that are a regular part of the 
musician's, vocabulary, are basic to an under- 
standing of language processes (Lieberman, 1967) . 
Uhile the ear is the common receptor for speech, 
music and noise, the possibility has been raised 
that more than one mode of processing may be 
operating, that a speech/non-speech dichotomy 
may exist. The utility of basic brain-f unction 



models can be better evaluated when data from 
research in music/speech perception are presented. 

For example, ’’hemisphere dominance" appears to be 
a uniquely human characteristic, which is a vital 
part of the human linguistic capability. One 
of the features associated vrith this phenomenon 
is the differential processing of auditory signals, 
depending on the type of signal input. 

Music of all types consists of highly organised 
sets of complex auditory relationships which are 
processed by humans. An understanding of the feature 
extraction and organizing function of the brain 
for non-verbal auditory phenomena such as music 
may provide clues to the building and refining of 
models of human information processing. The 
pattern recognition and processing capability in 
language has been studied extensively. However, 
studies in linguistics and psychology have not 
developed testable models of information processing 
which isolate the unique characteristics of music. 

IThile the evidence suggests that the auditory 
processing mode is dependent upon the type of 
input, the status of music in this apparent 
dichotomy has not been established. If there 
are two modes for processing auditory signals, the 
possibility exists that a perceptual disability, 
such as autism, may selectively interfere v/ith 
only one of these processing modes. Kno\»7 ledge of 
this condition may allov7 for the development of 
early diagnostic tests for the severity of the 
dis^ility, and at the same time indicate the 
possibility of new communication techniques to 
improve the compensatory use of intact processing 
modes. 



Knov7 ledge of auditory processing modes is 
basic to an understanding of the acquisition of 
linguistic and musical skills by the normal child. 
The acquisition of reading skills and the develop- 
ment of tonal memory are basic educational problems 
certain to be affected by a knov7ledge of processing 
modes. In the course of normal development, the 
auditory processing mechanism apparently undergoes 
important changes up to about the age of five 
(Kimura, 1967) . Beyond this age, at least with 
respect to auditory pattern recognition, we may 
all be "perceptually handicapped". The learning of 
basic sounds of language must occur before a 



critical age for the discrimination to be fully 
integrated into the language function. On the 
basis of results reported by Kimura (1964 ) , it 
is expected that this will also be true for music. 
Detailing the differences between the perception 
of speech and non-speech has begun to lead to 
specific models for the understanding of this 
developmental chcuige. 



II. RESEARCH DIRECTIONS: 



:XPERIIENTS AI^D MODELS 



A. Performance Adjudication and Analysis 

Evaluation of student achievement is a central 
problem at. all levels of Education. Grading 
student output is a major chore for most teachers, 
cuid is often the one in which they find the least 
satisfaction. The problem is a particularly 
difficult one in music education, where the 
musical performance is the focus for student achieve- 
ment. Experienced music teachers and performance 
judges contacted in the course of this project 
were skeptical of their ov/n ability to maintain 
objectivity and uniformity under mWiy of the 
conditions encountered in practical adjudication 
situations. No studies v;ere found vrhich dealt 
directly with the problem of the reliability of this 
type of evaluation of musical performance. 

Information is required concerning both the regrade 
reliability of a single judge, and the inter- 
judge agreement for groups of judges when dealing 
i\ath various types of musical performances. 



Studies of this type have a long history in 
the area of the grading of student essays. For 
example, studies by Findlayson (1951) and 
Phillips (1948) shov; a self-correlation among 
graders of betvreen .60 and .70. That is, when 
a teacher is asked to regrade a set of essays 
after a time-lapse, the agreement beta>reen the tv/o 
sets of grades is only 36 to 49 percent better 
than i^^ould be expected on the basis of chance 
alone. Inter judge correlations are in general 
even lower than this. Page and Paulus (1968) 
report inter- judge correlations ranging from .43 
to .59 obtained by comparing the grades given 
by experienced judges on a set of student essays. 

An effort to improve the reliability of essay 
grading in an effective and practical way was 
reported by Page (1966) in an article entitled 
"The Imminence of Grading Essays by Computer." 

The app.roach taken by Page uses a simulation 
of the human grading process through the use of a 
digital computer. .A computer analysis of essay 
features that can be given an actuarial representation 
is used to define the score for a given essay. 

The averaged scores from several human judges is 
used as a criterion, since it has a higher 



Figure 1 
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reliability than the score given by a single judge, 
when the judges are all from the population of 
interest. The effectiveness of this approach, 
reported by Page and Paulus (1966) , provided the 
stimulus for a similar application to the grading 
of musical performcinces, which is the subject of 
this section of the report. A pivotal premise 
for both studies is the possibility of defining 
variables which are useful correlates for the 
subjective variables influencing the human judge 
as he rates a given example of student output. 



Statement of Purpose 

This investigation is an attempt to add to 
the body of knoi'7 ledge related to the following 
questions : 

1. Are there operationally definable (i.e., 
objective, measureedsle variables for which musical 
performance limits can be established? 

2. If so, what are they and where are the 
limits of acceptability for various performance 
situations? 

3. Can a knov; ledge of such variables be made 
useful in solving the problems of music education? 



Statement of problem 

Given a set of scores assigned by a group of 
competent humein judges to a set of student musical 
performances, are there objective features of the 
recorded performance which can be used, with 
suitable computer analysis, to predict the 
averaged judges* score? 



Method of Solving the Problem 

The following steps V7ere taken in this 
investigation: (See Figure 1.) 

1. A sample of 62 recorded performances was 
selected from the 1967 Connecticut All-State music 
auditions . 

2. A group of seven competent judges was hired 
to grade each performance on five categories. The 
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judges' averaged score in each category for each 
performance served as the criterion measure for 
computer grading. 

3. Six objective features of the performances 
thought to be potentially useful for predicting 
performance quality were selected on an ^ priori 
basis. Specialized machine and hand techniques were 
developed to process the performances to extract the 
features. Values of the features associated with 
each note of each performance were punched on 
computer cards. 

4. The digital computer v^as employed to process 
the feature values for normalization, recovery of 
initial data, and data reduction. Thirteen 
predictor values for each performance v^ere derived 
from the original six features. 

5. Two different performance standards v;ere used 
in deriving the predictors, and the results compared. 

6. The thirteen predictor variables vrere 
entered into a multiple regression program on the 
digital computer, and weightings were assigned 

to each variable in order to maximize the correla- 
tion betv/een the predicted and the criterion grade. 

7. The correlations vjere empirically cross- 
validated on subsets of the original sample in 
order to determine how vjell this program could be 
expected to predict scores for a new sample of 
performances . 



Significance 

The investigation of objective features of 
performance to determine their usefulness in 
predicting judges ' responses is of importance in 
at least two major areas of music education, (i) the 
evaluation of student achievement and (ii) the 
development of practice and diagnostic sessions in 
which computer generated feedback supplements 
regular tutorial procedures (i.e.. Computer 
Assisted Instruction) . 



i 
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Evaluation of Student Achievement 



The evaluation of student iTiUsical perfoinnance 
by human judges would appear, from the results 
of this investigation, to be at least as reliable 
as the evaluation of student prose. Further 
confirmation of this result is required. The 
computer, given a valid regression equation, 
vrill produce an identical evaluation each time 
a particular performance is submitted and V7ill 
apply ratings v/ithout the biases introduced into 
human judging by fatigue, effects of performance 
juxtaposition, kncvrledge of student personality, 
sex, race and appearance. 

P. successful simulation of the judging of 
musical performances by human auditioners has 
an immediate application to the problem of 
auditioning large numbers of student musicians 
for placement in schools and musical programs . 

I'any” states have "All-State" beind, chorus and 
orchestra festivals to give experience and encour- 
agement to music students. The process of 
auditioning hundreds of these students and to 
give each a fair chance to demonstrate his ability 
is a very difficult one. Computer grading may 
vjell provide the ansv/er to this problem. 

A direct comparison of student achievement 
or evaluation techniques between groups separated 
in either place or time has not been practical 
in the past. For example, it might be desirable 
to knoi7 hovj this year's students would be rated 
by last year's judges. The judges, as a group, 
may no longer be available. However, if an 
adequate sample of their scoring is available, 
a valid computer simulation would provide an 
accurate representation of their collective 
opinion on this year's perfomances , or on any 
other set of recorded perform.ances . 

This procedure is not limited to a particular 
standard of judgement. TTith judges representing 
several schools of thought or stylistic preference, 
categorization of performers vrith regard to 
stylistic suitability could be done effectively 
and efficiently. For example, a choral conductor 
might select an ideal grouping for a main and 
an echo choir, or, choose the voices most approp- 
riate for a madrigal group, without the necessity 
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of individual auditions each time a new partitioning 
of the chorus is required. 



Application to Computer i^ssisted Instruction 

Kuhn and Allvin (1967) , reporting on V70rk 
by Spohn (1959) and Carlsen (1963) in the area 
of programmed instruction state (p. 2) : 

"As these two examples indicate, 
prograuning techniques for music instruc- 
tion have been successfully undertaken 
as long as essentially verbal or symbolic 
response modes have been used, such as 
multiple choice, true-false, comparison, 
notation, and so forth. Typical response 
modes in such studies are marking, pointing, 
using a typewriter, and so forth. 

Obviously, a nev7, direct, and 
essentially musical resnonse mode is 
needed. " 

These two authors go on to describe an experiment 
in CAI in which a musical response mode is an 
essential part. Student vocal response was 
analysed with a "pitch extractor" , and the resulting 
information was fed into an IBI'! 1620 computer 
for processing. Evaluation v/as made in terns of 
deviation limits from the true pitch, using errual 
temperament. Students were able to select limits 
of either a tx-zo percent or four percent range. 

Deihl and Radocy (1969) report a similar 
effort, using an IBJ'' 1500 Instructional System. 

Here, however, student response was off-line, 
but mention is made of the possibility of future 
on-line interaction. 

These examples indicate the directions being 
taken in the application of CAI to music education. 
Hov7ever, none of these studies comes to grips 
with a fundamental problem common to all attempts 
at computer interaction x^rith the performer; 
the relationship betvreen the acoustic variables 
being measured, and the sxabjective responses of 
competent 3-isteners. Training in the control 
of an acoustic variable does not necessarily 
result in the musical control of a subjective 
variable. 
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The approach used in this investigation provides 
a test of the relationship betvreen the acoustic 
parameters chosen, and the subjective responses 
indicated by the judges. Once a stable relation- 
ship has been discovered, a training procedure 
based on manipulation of the acoustic parameter 
can be developed, and the results can be anticip- 
ated with some assurance of success. 



Methodology 

The simulation techniques used in this experi- 
ment were patterned after those used in Project 
3ssay Grade (Page and Paulus, 196 B) , an investiga- 
tion supported by the U.S. Department of Health, 
Education and Welfare (Project No. 6-1318) . 
Differences vrhich occur between this project 
and "Project Essay Grade" are those imposed by 
the different mediums being explored, rather than 
any difference in approach. These appear primarily 
in the area of feature extraction. For the essay 
grading project, the data, consisting of hand- 
written student prose, v;as punched as literally 
as possible into IBM cards. Features vrere 
extracted by a computer einalysis of the text: 
spelling, punctuation, word length, sentence 
length, etc. In the music project, features are 
extracted from a recording on magnetic tape of the 
performance to be analysed. For the essay grading 
project, judgements in five categories were 
predicted for 256 essays using thirty predictors. 

In the music project, judgements in five categories 
were predicted for 62 vocal performances using 
thirteen predictors. 



Psychometrics and Statistics 

The basic conceptual fram.evrork for evaluation 
2 Uid prediction in this investigation is found 
within the disciplines of psychometrics and 
statistics, as presented in many fundamental 
texts such as those by Kays (1963) , Kelley (194B) 
and Winer (1963). Rozebcom (1966) was particularly 
useful in his discussion of homogeneity, reliability 
and validity. The programs for correlation were 
based on Cooley and Lohnes (1962) . The step- 
wise program from the System/360 scientific 
subroutine package (IBM, 1968) vras used for the 
multiple regression analysis. * 



Pattern Recognition 



In a general sense, a simulation which involves 
responses to an input data set is a problem in 
the field of pattern recognition. In this case 
a pattern must be detected in the acoustical 
signal which can be used to predict the response 
of musical judges to that signal. A presentation 
of basic concepts in pattern recognition may be 
found in Sebestyen (1962) and in Milsson (1965) . 
Statistical models are an important part of 
pattern recognition procedures, with greater 
emphasis on decision making than is usually found 
in statistical usage. The extraction of features 
from which decision boundaries can be constructed 
is another fundamental process in pattern recogni- 
tion, but there are no general algorithms for 
extracting useful features from a data base. 

Feature definition must be based on the attempts 
of previous investigations to find measurable 
aspects of the data field. Features which are 
useful for prediction are often specific to a 
given application. To be useful, features \/hich 
are based on a nominal or ordinal scale must 
correlate, either separately or in combination, 
with the criterion. Nominal features must be 
evaluated using categorical decision processes. 

For this investigation, the features extracted 
were all given interval scale interpretations, 
and were based in part on the findings of studies 
in musical acoustics and the psychology of music, 
discussed in the next section. 



Musical Acoustics and Special Equipment 

The development of techniques for operationally 
defining important acoustic variables owes its 
greatest advancement in the 19th century to 
il.L.P. Helmholtz (1877). TThiie others in this 
and earlier centuries laid the foundations for 
vibrational phenomena (Young (1784) , Rayliegh 
(1877), Fourier (1822), etc.), Helmholtz explored 
the perceptual problems as v;ell. Of the many 
recent publications in musical acoustics, books 
by Benade (1966) and C.A. Taylor (1965) have 
been particularly helpful in the present investiga- 
tion. 

At the turn of the century, the career of 
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Carl E. Seashore, v7ho dominated the field until 
World War II, was just beginning. A frequently 
stated objection to memy experiments in the 
psychology of music is that they are conducted in 
such a way as to remove them from the reality of 
musical performance. Studies in perception 
using isolated tones, pure tones and other 
convenient simplifications do not alvrays generalize 
to real performance situations. Seashore tried 
to avoid this problem by vrorking primarily from 
“live" as opposed to laboratory situations. The 
same attempt at relevance is made in this project. 

In one of his first papers, A Voice Tonoscope 
(1902); Seashore described a device that was able 
to present visually, in graph form, the fundamental 
frequency of a live performance as a function of 
time. This device made it possible to measure 
systematically, variables thought to be related 
to musical perception, and to test hypotheses 
about them.. 

The equipment used to process the acoustic 
signal in this investigation, a modem version of the 
tonoscope, was planned and tested by J. Heller and 
W. Campbell. Its use in a related application, 
visual pitch matching, is described, in a report 
by j. Heller (1969). A standard Frequency 
riodulation Sub-carrier Discriminator (Electro- 
Mechanical Research #287A-01) , normally used in 
telemetry applications, was used as a frequency 
meter. It was modified by the manufacturer to 
operate over a pitch range of one octave from 
slightly above middle C to the next octave (266.7 
Hz to 533.3 Hz, or 400 Hz + 33%). Using a 
Schmidt trigger, the discriminator produces a DC 
output proportional to the input frequency, over 
the input range: v = k(f - f) where v = output 

voltage, f * frequency of input signal, f ' = 
center frequency, in this case 400 Hz, and k is 
the proportionality constant. The output voltage 
was presented as a function of time, on paper 
charts using an Esterline angus Speed Servo 
Recorder/S601 S. 

A Sony Model TC-5600 Tape Recorder, with 
variable speed control, was used to adjust the 
performance pitch range to the acceptance range 
of the frequency meter. By noting the actual start- 
ing pitch for each performance, and entering these 
data into the computer program, all pitch and 
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time values ^rere restored to correspond to the 
original performance speed. These calculations 
are detailed in Appendix B. 

The choice of features v/hich are related, at 
least indirectly, to the perceptually significant 
aspects of the acoustic signal is an essential 
part of this approach. Clues to the nature of these 
features can be found in the many sources in the 
psychology of music. Seashore (193S) provides 
a detailed account of the relationships betv;een 
the objective and perceptual variables in his 
chapters on Pitch, Loudness, Duration and Timbre. 

He also enumerates (Seashore, 1938, p. 28) a 
list of basic principles in the psychology of 
music which are taken, with only minor changes, 
as fundaiiiental concepts for this investigation. 

More recent studies (J.D. Harris, (1952), J.C.R. 
Licklider (1956), Robinson and Dadson (1956)) have 
provided details and further clarification of the 
illusions of hearing referred to by Seashore. 



Summarv 



The first experiment completed under the grant 
vjas a computer simulation of an adjudication of 
musical performance. In this investigation, a 
panel of seven competent musical judges was asked 
to audition a set of sixty-two short vocal 
performances by students, which \*7ere recorded on 
audio tape. The judges v;ere to respond by scoring 
each performance on a five point scale in several 
categories, such as intonation, dynamics, etc. 

The average of the judges' scores was found to be 
stable, and was the criterion for a computer 
simulation of the judges' response, using multiple 
regression analysis. The prediction of the 
judges' response to each performance was based on 
frequency, intensity and duration measurements of 
the performance. 

This approach provides a means of examining 
the relationship between performance competence 
as rated by experienced judges, and the vibrational 
characteristics of performance. Specific 
vibrational patterns can be tentatively identified 
as correlates of a particular response from the 
panel of judges. Synthesis of these vibrational 
patterns can then be used to test the possibility 
of a cause and effect relationship between the 



pattern and the response. The results of this 
experiment were very encouraging, particularly in 
view of the highly simplified (linear) predictive 
model employed. 



Some idea of the simulation capability achieved 
in this first attempt can be seen in the follov?ing 
table. The numbers indicitted are typical 
correlation values. A correlation of unity means 
perfect agreement, and a value of zero indicates 
a random relationship. 

Score Comparison Correlation 

Between two groups of 

judges (group averages) .85 



Betv/een computer scores 
& judges' average 

Between one judge and 
the group average 



.65 (shrunken 
mult-r) 

.65 



Between two judges 



.40 



From this it can be seen that the computer comes 
as close to predicting the group average as the 
typical experienced judge (.65). The correlation 
betv;een any two judges (.40) is considerably lower 
than this value. Refinements in technique are 
expected to further improve computer prediction of 
judges* scores. A complete report of this experi- 
ment appears in Appendix I. 



B. Performance Modification and Listener Response 
Introduction 



A. second experiment completed during the 
grant period was designed to determine the 
effect of the attack transient upon the recognition 
of instrumental timbres. This study approached the 
problem by mechanically replacing the attack of 
one instrument by the attack of another, to determine 
if the listener is influenced in his attempt to 
recognize instruments more by the attack or by the 
steady-state portion of the tone. This controlled 
modification of natural sounds was used to test 
hypotheses regarding differences between instrumental 
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timbre recognition for normal, altered, and 
"no-attack " tones . 



Related Literature 



Several previous studies of timbre have 
recognized the importance of the attack transient 
in timbre recognition. Seashore (1938) alludes 
to a study by Lewis and Cowan (1937) who mechanic- 
ally replaced even vocal releases (decay) with 
gliding attacks. "The musically acceptable glide 
at the beginning of the tone became utterly 
intolerable when placed at the end of a tone." 
Seashore does not investigate the effects of attack 
or decay on sonance (tone quality), but states, 

"This experiment opens a very fertile field for 
the investigation of reasons for adaptive or 
habitual hearing." 

Nolle and Boner (1941) found, in investigating 
the initial transients of organ pipes, that "these 
initial transients seem to be importcint in 
determining the subjective character of pipe organ 
music and in differentiating between the subjective 
character of pipe organ music and of music produced 
by electronic instruments." 

In synthesizing instrumental tones, studies 
by Fletcher, Blackham, and Stratton (1962), 

Strong and Clark (1967), and Risset and Mathevrs 
(1969) have stated that the attack plays an 
important role in recognition of the instrument. 

On the basis of analyses of the harmonic 
content of various wind instruments, with one 
result being that little substantial difference 
v;as found to exist between the harmonic content 
of certain instruments, Saunders (1946) concluded, 
"These facts lend support to the idea which is 
often expressed that ein oboe and a violin vrould be 
indistinguishable if one were prevented from hearing 
the beginnings or the endings of the tones . " 

Richardson (1954) , in an investigation of 
transients by spectrum analysis, states, "In spite 
of their evcinescent nature, the view is nov^ held 
that it is these transients vrhich enable the 
listener to distinguish the sounds of different 
musical instruments or between two of the same 
class. The transient is indeed part of the 'formant' 



of an instrument, and ought to be exhibited as 
a characteristic alongside the steady-state 
spectrum,” 

These studies have all indicated, to one 
degree or another, that some importance nay be 
attached to the attack as an indicator of timbre* 
However, all of these studies vrere performed under 
conditions in which the evaluators v;ere av;are of 
what timbre was being considered. Under such 
circumstances an evaluator's judgment would tend 
to be based on an ideal concept of the particular 
tone being judged, rather than a mere identification 
of timbre, 

A more valid approach to measuring the import- 
ance of attack to timbre recognition V7ould be to 
present suJDjects with unidentified tones, vrith 
and without attacks, to be identified by the 
subjects. This method has been utilized in studies 
by Berger (1964), and Saldanha cind Corso (1964). 

Berger presented 30 subjects with the tones 
of 10 different wind instruments. The tones were 
presented in several forms: unaltered, played 

backwards, attack and decay removed, and all 
harmonics except the fundamental filtered out. 

The results showed 59 per cent of the unaltered 
tones identified correctly and 35 per cent of 
the tones minus attack and decay identified 
correctly. The tones played backwards and the 
filtered tones were correctly identified 42 per cent 
and 18 per cent of the time, respectively. 

Saldanha and Corso, in a similar study, 
presented 20 subjects v;ith the tones of 10 string 
and wind instruments, unaltered, and with various 
alterations. The results shov?ed 41 to 44 per cent 
of the unaltered tones correctly identified (the 
two figures are a result of longer and shorter 
steady-states) , and 32 per cent of the tones i7ith 
no attack identified correctly. Doth of these 
studies indicated a strong relation betiireen 
attack and timbre identification, but no statistical 
analyses of the results are provided to confirm 
this fact, nor is the effect of the quality of 
the attack investigated* 

In spite of the work vrhich has been done in 
this field, one finds that the generally accepted 
definition of timbre remains as some exclusive 



function of the harmonic content. Lun»?in (1967) 
discusses tinbre strictly in terns of the steady- 
state portion of the tone, and. ITeilson (1970) 
describes tini>re as the "relationship of the 
decibel strength in the fundamental of a given 
tone to that of various overtones." 

The present study attempts to demonstrate 
vrfiether or not the initial transient is an 
integral part of timbre recognition, and, if it 
is, to investigate what effects changes in tlie 
quality of the attack have on the recognition 
of timbre. 



Procedures 

stimulus tape v?as prepared which included 
120 tones. Pour instruments, (flute, oboe, 
clarinet, and trumpet) each playing three pitches, 
(d',c'', gb'') v^ere recorded. Each of these 
12 tones was modified by replacing the normal 
attack by the attack of the other three instruments 
on the same pitch. A second modification of 
these 12 initial tones v:as made by eliminating 
tile attack portion of each tone. This produced 
48 tones. The original 12 tones (v;ith no modifica- 
tion) was also included in the final tape. 

Each of these 60 tones were recorded, a second 
time and placed on the final tape in a random 
sequence . 

The stimulus tape was administered to three 
groups of subjects, high school instrumentalists 
(n®$7) , college students enrolled in an introduc- 
tion to music history course (non-music majors, 
n«43) , and college music majors including several 
music faculty and professional musicians (n=38) . 



Results 



The flute and clarinet steady-state tones 
were identified correctly (82% and 79% respectively) 
more than the oboe and trumpet steady-state tones 
(70% and. 77%) . The identification of the flute 
V7as slightly less accurate than that of the 
clarinet when preceded by other attacks, but 
v;as more accurate v?hen preceded by its own or 
no attack. The attack portions of flute and 
clarinet, however, are identified quite differently 
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from each other. The flute attack T7as least often 
correctly identified and the clarinet attack 
most often correctly identified. This indicates 
that the flute provides very strong identification 
information in its steady-state, but very 
little in its attack. 

The oboe vras the least recognizable instru- 
ment when presented without modification (70% 
correct) . The trumpet was recognized somevjhat 
better than the oboe (77% correct) . T/ith the 
removal of the attack, the trumpet dropped 
well below the oboe in degree of recognizability. 
vlith the addition of attacks of the other instru- 
ments oboe identification scores dropped consider- 
ably (from 63% to 51%), the trumpet scores only 
slightly (from 45% to 42%) , nevertheless, the 
trumpet remained the least accurately identified 
tone quality. The oboe steady-state v»as most 
influenced by the attacks of other instruments, 
v;hile the oboe attack had the least influence 
on the steady-states of other instruments. 

The trumpet attack provided a great deal 
of identification information in combination with 
the trumpet steady-state. !?hile in combination 
with other steady-states, the trumpet attack 
did not provide as much identification information 
as the clarinet attack, it provided more informa- 
tion than flute and oboe attacks. The trumpet 
steady-state by itself or in combination with 
other attacks was easily confused or identified 
as the attacking instrument, and the trumpet 
attack with other steady-states did not provide 
as much information as one might anticipate, in 
light of its effect on the trumpet steady-state. 

The overall results of this study shov; that 
identification of timbre becomes less accurate 
(statistically significant beyond the .01 level) 
as the tone progresses from normal, to no-attack, 
to altered. That is, the attack affects aural 
recognition of timbre. (See Appendix II for a 
detailed discussion, and for tables of results.) 



\ 

1 
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C. Information Processing and f’usical Perception 



It is a commonly held notion that speech 
and music occupy different levels of brain 
function. Speech perception is considered to 
be a highly ordered process, involving complex 
decoding and pattern recognition techniques. 

Music, by contrast is often considered to be a 
visceral activity, involving the lovrer brain 
centers , the emotions , and the rhythms of body 
movement and heart-beat. 

This research is an initial attempt to test 
an entirely different model, in vhich music and 
speech are considered as analogous functions. 

That is , that musical perception is also a 
highly ordered process, subject to some of the 
same constraints and peculiarities that are a basic 
part of speech perception. 

Pan is the only mammal v»hose brain sho\'?s 
evidence of a strong functional asymmetry betv;een 
the cortical hemispheres (Sperry, 1964) , This 
unique adaptation, v/hich reduces the redundancy and 
increases the functional capability of the brain, 
is closely related to the development of language. 
There is very strong evidence that betvreen the 
ages of 1 and 6 in the human child, the language 
function is (normally) taken over by the left 
hemisphere, and that some musical and higher 
order visual processes are performed in the right 
hemisphere. 

Evidence for the functional asymmetry of the 
human cerebral hemispheres comes primarily from 
four separate areas of investigation: 

1, Tests of functional limitations in 
brain damaged patients for whom the location and 
extent of damage is knor^m (Luria, 1970; Mount- 
castle, 1962) . 

2, Slectro-enceohalagraphic studies (Cohn, 
1971) . 

3, Tests of epileptic patients v;ho have 
undergone a sectioning of the corpus callosum 
(the inter-connecting structure betv;een the 
cerebral hemispheres), (Cazzaniga, 1967), 

4, Experiments on normal subjects using the 
technique of "dichotic presentation” (presentation 
of simultaneous but different stimuli to each 
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ear) . The success of this technique in determining 
the locus of processing for a given input stimulus, 
is based upon the hypothesis that the representa- 
tion for the stimulus presented to each ear is 
greater for the contralateral than for the 
ipsilateral hemisphere. (Shankweiler, Studdert- 
Kennedy, 1967 ; Kimura, 1961 ; I'nox and Kimura, 

1970 ) . Fany experiments with speech stimuli have 
been conducted using the dichotic technique, but 
only two have involved musical stimuli (kimura, 
1964 ; Shankweiler, 1966 ) presenting different 
melodies to each ear, v;ith a left ear recall 
preference. 

Because of the very limited nature of the 
investigations specific to music, and the many 
statements linking m.usic variously x-rith speech 
and non-speech (barren, 1971 ) a pilot study v;as 
instituted in order to establish a basis for 
tentative brain function models in music. 
Shankweiler amd Studdert -Kennedy ( 1967 ) showed 
a right ear superiority for consonants, but none 
for vowels. They stated "in view of Kimura's 
finding ( 1964 ) of a left ear advantage for 
musical melody recognition, as against a right 
ear advantage for spoken digits, the neutral 
status of steady state vowels, mid\'7e.y, as it \<ere, 
betv?een speech and music, is perhaps not surpris- 
ing,” (p. 60 , Shankweiler and Studdert-Kennedy , 
1967 ) . 



Felody recognition is of course only one of 
a large number of responses available to a 
moderately "literate" listener. It is possible, 
using plausibility arguments, to construct a 
comprehensive multi-level analogy betv;een speech 
and music, from the phoneme level to the breath 
group and sentence. The analogy is,hov/ever, 
only an interesting exercise if it is not 
found to be in some way predictive of similar 
perceptual processes. The corresponding structur- 
al elements in music then, by implication, involve 
cues for decodin g and defining the structure 
generating a particular musical "message". 

The phoneme is the building block for natural 
language, vrith classification categories of vowels, 
♦ consonants, semi-vov/els, liquids, etc. In speech, 

the consonant has been shown to require a complex 
decoding process involving primarily the left 
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hemisphere (Broadbent and Gregory, 1964) . 

Spectral envelope changes occurring Trathin 20 
to 60 milliseconds provide the primary identifica- 
tion cues (Liberman, et al., 1967). The musical 
analogy to the consonant /vowel/consonant sequence 
is the attack/steady-state/decay sequence for 
an instrumental tone. In speech, the consonant 
environment conditions the perception of vowels, 
and consonants provide a much greater reduction 
of uncertainty than do vowel sounds. A number 
of studies have shown that the attack portion of 
an instrument tone also has these attributes 
(Stumpf, 1926; Tenney, 1965; Risset and Mathews, 
1969; Strong and Clark, 1967a, 1967b). 

Since the phoneme is the fundamental speech 
segment, features such as length, stress and tone 
that span more than one segment are designated 
"suprasegmental" . These suprasegmental features 
have direct counterparts in the musical phrase: 
stress, rubato, portamento, articulation, dynamic 
contour, timbre modification. 

The musical counterpart to syntax has been 
extensively documented under the general category 
of melodic and harmonic structure. 

In order to test the analogy it seemed 
appropriate to start at the most basic level, 
that of the phoneme. The method of dichotic 
presentation provides a clear operational test 
of the music/speech analogy. Are instrument 
attacks more readily identified when presented 
to the right ear, with the implication of left 
hemisphere processing? 

To ansv;er this question, six musical instru- 
ments (trumpet, violin, guitar, oboe, clarinet, 
flute) were recorded vjhile producing a concert 
"middle-C*' (nominal frequency, 261.6 Hz). These 
instruments were chosen to represent the major 
classes of sound production in music, v/ith the 
exception of percussion. The violin v;as bowed 
cind the guitar plucked. The flute was replaced 
with a small bottle tuned to 261.6 Hz, because of 
the veiry long attack transient for this lovr note 
in the flute's remge. 

The "attack" portion of each tone was 
defined by the experimenter on the basis of 
oscilloscope traces. The durations chosen v/ere: 
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truiTipet, 45 ms; violin, 43 ns; guitar, 33 rs; 
oboe, 47 r\s; clarinet, 55 ros; "flute”, 65 ns. 

The tones vrere recorded a second tine vjith the 
Steady-State portion of the original instrument 
replaced by an electronically produced triangular 
wave at approximately the same pitch and phase 
(at the transition point) as the original signal. 
Channel sv/itching vras accomplished using voltage 
controlled amplifiers, envelope generators and 
follovrers, trigger delays and oscillators from 
a roog synthesiser. 

The six hybrid sounds were then aligned and. 
recorded on two separate tape tracks, so that all 
15 pairings of dissimilar instruments v;ere 
represented, A random sequence of 30 items 
(each stimulus presented tvrice, vrith channels 
reversed on the* second presentation) was prepared, 
along with a single channel training sequence. 

Four subjects were taught to label the sounds 
relicdsly (the composite tones could be discriminated, 
but were not initially identifiable) . They then 
listened twice to the random sequence of 30 
stimulus pairs presented dichotically. For the 

second presentation, the earphones vrere reversed, 
to correct for any imbalance betvreen the tv;o 
electronic channels. Subjects were asked to 
identify both instruments in a stimulus pair, 
guessing if necessary. The scoring v»as based 
on responses for which only one instrument was 
correct. All subjects responded at well above 
the chance level: the average nur»3?er of presenta- 

tions (out of sixty) for v.^hich at least one 
instrument v;as correct was 52, 

All subjects showed a right-ear advantage 
for identifying instrument attacks. The right ear 
average was 24,25 correct responses, compared 
to a 14.25 average for the left ear (difference 
significant at p <.05) . This result lays the 
foundation for further experiments to detail the 
processing analogy between speech end music. 

The research in speech over the last 20 
years has progressed steadily on the problem 
of human information processing. The study 
of the perception of music has received little 
attention from this point of view, and comparative 
music/speech studies are difficult to find 
(Slawson, 1953; Kimura, 1964) , This neglect is 
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possibly due to the caricature of music, fostered 
even by musicians as a "visceral*’, "non-intellectual" 
activity . 

It is our contention that music is as 
complex a pattern recognition activity, even for 
the casual listener, as is speech. In fact, it 
seems reasonable to suppose that the structure 
of music may provide a more useful map of 
the brain's sj^ol manipulation process than speech, 
since fev;er constraints (cultural, articulatory, 
referential) are imposed upon the development 
of musical patterns and modes of production. 

Implicit in this surmise is the idea that both 
speech and music map to subsets of a generating 
structure which is more general than the structures 
delimited by "speech" and "music". In addition, 
this general structure, v;hile very large in terms 
of storage, and very flexible in terms of pattern 
orgemization, cannot be considered infinite. 

In 1963, H. Bremerm.ann shov/ed that the funda- 
mental coarseness of matter does not allow it to 
tremsmit more than 1.6 x 1047 bits per gram per 
second. Ashby has pointed out that, for combina- 
torial interactions, even a very m.oderate- 
appearing situation, such as a square screen of 
20 by 20 lamps, provides enormously more possible 
patterns than those that could be processed by 
any device the size of the brain, over a lifetime. 

The number of discriminable patterns or pattern 
classifications must be very small indeed, 
compared to these fundamental limits. 

In speech the recognizable articulatory 
patterns are limited to about 60 phonemes for all 
the v/orld's languages and only a subset of these 
is used in any one language. In addition, 
their identification is not fixed, but is, to 
some extent, context dependent. In music, the 
"phonemic" elements are tones of distinct pitch, 
timbre, loudness and duration. Sequences and. 
combinations of listener "enjoyment" or appreciation 
is based on the ability to recognize the 
sequence of tones as belonging to a class of 
patterns previously delimited (in memory) by implicit 
rules of musical grammar. It should be emphasized 
that for the general listener, this recognition 
may not be recoverable; that is, he cannot 
specify the basis upon v»hich his decision is 
made. 
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This is analogous to the language auditor 
rho responds “Da'' \;hen "Ba.” has been presented, 
but has no awareness of the distinctive features 
which he uses to discrininate “Ba" from "Da". 

Referring to a higher linguistic level, 
Langacker (1968, p. 234) states: 

I'Then the child has learned to talk, 
v;hen he has mastered his native tongue, 
he is in possession of an abstract system 
of rules that specify an unbounded class 
of well-formed sentences. He is not 
conscious that he possesses this system 
in the sense that its structural patterns 
have been imposed on his psychological 
processes, so that these patterns are a 
factor in determining the course of his 
verbal activity. Learning to talk, like 
learning to ride a bicycle, involves the 
mastery of a set of principles; it involves 
the addition of structure to the body of* 
psychological skill or competence that shapes 
our mentally directed behavior. These rules 
are thus no more accessible to conscious 
inspection than the rules for keeping 
one's balance xjhile riding a bike. T-7e 
talk and v/e keep our balance on bicycles, 
but in neitlier case do we know, at the level 
of consciousness, precisely what the 
guiding principles are. 

It is clear that for language the develop- 
mental period from birth to about six years old is 
a very critical one, in the sense that sound 
patterns not learned during this period are only 
learned v/ith great difficulty, if at all, at a 
later age. A similar argument applies to syntax, 
although at this level, usable ex post facto 
rules can be devised to approximate the structure 
"discovered" by every normal child before the 
age of five. 

This period (to approximately age 6) may prov- 
ide a "learning v;indot*f" for auditory pattern 
classification that is closed when the hemispheric 
asymmetry development is completed. In the case 
of language, participation in language production 
undoubtedly provides a necessary step in the 
formation of a complete language generating and 
decoding system. Sussman (1972) states "the 

- 27 - 




<■' r 
n ' '-i: 



versatility of the human speech production system 
and an increasng body of evidence suggests that 
speech is controlled by an intricate closed- loop 
feedback system. To bring about feedback control 
of the speech musculature, the higher neural centers 
should be kept constantly av;are of (a) the spatial 
position, (b) the direction of movement, and (c) 
the rate of movement of the articulators. This 
review describes the feedback mechanisms existing 
within the tongue that can mediate such dynamic 
space-time information." 

Traditional means of musical production depend 
upon control systems (finger, hand, and arm muscles? 
breath control contrary to that needed for speech) 
vrhich are not sufficiently developed in the young 
child to allovr his exploitation of musical patterns, 
to the degree that he explores speech. The 
"learning v.’indov;*' idea provides testable hypotheses 
concerning musical "talent" (musical facility 
demonstrated in production) . There may, for 
example, be a critical overlap between the "learn- 
ing v;indovr" and the means for exploring, and 
extemporizing musical patterns. The anecdotal 
immaturity of many professional musicians may 
have a close relationship to their musical ability: 
a delayed "closing" of the learning v;indor^ may 
cause sufficient overlap with the developing 
muscle control to provide the necessary feedback 
contingencies (see Figure 2) . 

This model appears to have broad implications 
for many areas of h\ 2 man intellectual development. 

The role of musical pattern recognition in this 
model is dependent upon its possible inclusion 
under speech or speech-like processes. If the 
experiments described in this report, show 
conclusively that some non-speech sounds are 
decoded in idie left-hemisphere . (and if, in addition 
they are found to produce categorical decisions: 
Liberman, et al., 1962) then it is unlikely that 
left hemisphere processing is completely limited 
by a finite store of feature extractors possessing 
some invariant relationship to a linguistic 
category. It is more likely that a general 
classification cue, such as a critical rate 
of change in the spectral envelope, determines the 
processing site. A synthesized set of sounds v/ith 
variations of temporal and spectral attack para- 
meters is being prepared in an attempt to isolate 
a processing cue which is not exclusively linguistic. 
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In view of our findings taken in conjunc- 
tion v.’ith those of Kimura (1957) and others, the 
music/speech analogy may have implications for 
reading problems, such as congenital dyslexia, 
vrhere anecdotal accounts of unusual musical 
ability (non-notational) have up until noir been 
considered as primarily compensatory adaptations. 
The problems of autistic children appear to be 
closely connected with an inability to encode 
language ("Autism: A Deficiency in Context- 

Dependent Processes?", Pribreim, 1970). Anec- 
dotes of musical precosity in some autistic 
children, and the role music plays in some 
treatment centers such as Benhaven in ?lev7 Haven, 
Conn., indicate a close, complex relationship 
between speech and music for this disability. 

It seems to be of basic importance, hovrever, 
to investigate the scope of left hemisphere 
auditory processing, particularly when it appears 
to extend beyond the range of stimuli considered in 
previous investigations. (See Figure 3.) 



D. Systems for Performance Analysis and .Podif ication 



1. The Computer Analysis System 

A large capacity digital computer provides a 
vride range of processing. capabilities and is a 
necessity for reducing large amounts of data to 
useful descriptors, in order to prepare auditory 
material for digital processing, an analog to 
digital converter must be used. The analog 
(continuously varying) signal from an audio tape 
is converted to digital (discrete) values and is 
stored on magnetic tape in a form suitable to the 
input requirements of an IBM 360 digital computer. 
Two input formats are available: 

a. Five bits + sign, at 2 OK v^ords per second, 
which provides a 30 db dynamic range up to 
lOK Hz. 

b. Nine bits + sign, at lOK vrords per second, 
which provides a 54 db dynamic ranee up to 

5K Hz. 

Duration, frequency, intensity, harmonic and 
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noise analysis can readily be accomplished with 
the appropriate computer programs. In addition, 
programs for performance modification open many 
possibilities for investigation, since performances 
on traditional instruments can be altered slightly 
for controlled effects without destroying the 
naturalness of the processed sound. Figure 4 
shows the system schematically, and Appendix III 
describes the soft V7are developed for the system. 

2. Tone Line Writer 

The tone line writer produces a visual record 
on graph paper of the fundamental frequency and 
amplitude contours of a recorded performance, 
over a three octave range. (133 Hz to 1067 Kz? 
approximately C below r'^:iddle C to the soprano high 
C) . A modified FM Subcarrier Discriminator 
(a standard telemetry device) serves as an accurate 
frequency meter. An octave switch, designed 
specifically for this system, automatically sets 
the proper range, substantially reducing the 
problems associated with tones containing strong 
second and third harmonics. 

The graphs produced by this system allow 
visual inspection and measurement of the durations 
of tones and silences, and the vibrational corre- 
lates of vibrato (pitch modulation) , tremelo 
(loudness modulation) and other vibrational 
characteristics. Figure 5 is a schematic of the 
tone line v^riter. 



3. Tone Line Reader 

The tone line reader accepts hand drawn charts 
representing pitch and loudness contours, and 
converts them to an auditory output by means of 
a voltage controlled synthesizer, such as the rioog. 
The optical scanner or "chart reader” is a modified 
document transmitter built by Graphic Transmission 
Systems, Inc. A two level signal from the scanner 
indicates the presence or absence of a line on 
the chart. The sample and hold unit determines 
the voltage appropriate for a line-indicating- 
pulse at each point in the scan, and then presents 
that voltage, when there is a pulse, to the 
proper sound control unit. 

Chart paper is fed to the scanner at one inch 
per second, and sixty scans are made each second. 

- 32 - 



u\ 

0 

u 

1 




- 33 - 

■ • 0 

o 

ERIC 



Cseilloscopo 



Both pitch and loudness indications can be scaled 
at will within the limits of the 8 • 1/2 inch scan 
width. The system is diagramed in Figure 6. The 
electronic interface, which converts the output 
of the optical scanner to a suitable control volt- 
age, is described in Appendix IV. 



4. Harmonic Synthesizer 

An harmonic synthesizer with both phase and 
harmonic amplitude control would provide an in- 
valuable tool for investigating the timbre changes 
necessary and sufficient for a particular listener 
response to a musical sequence of tones. Basic 
research on the pitch analysing processes used by 
the ear, the appearance of subjective tones, and 
the tonal characteristics of instruments would be 
greatly facilitated by such a device. 

In view of this, a graduate student in the 
University of Connecticut Electrical Engineering 
Department vras engaged to provide a prototype 
design for the device, and to simulate the output 
of the device using a digital computer. The results 
of this investigation are presented in Appendix V. 
The cost of building the synthesizer was outside 
the limits set by the present funding. 
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Appendix I 

COMPUTER ANALYSIS OF MUSICAL PERFORMANCE 
Warren C. Campbell 



PURPOSE AKD PROBLEM 



This investigation is an attempt to add to the body of 
knowledge related to the following questions: 

1. Are there operationally definably (i.e., objective, 
measurable) variables for which musical performance 
limits can be established? 

2. If so, what are they and where are the limits of 
acceptability for various performance situations? 

3« Can a knowledge of such variables be made useful 
in solving the problems of Music Education? 

Given a set of scores assigned by a group of competent 
human judges to a set of student musical performances, are 
there objective features of the recorded performance which 
can be used, with suitable computer analysis, to predict 
the averaged judges* score? 



PROCEDURES 



Figure 1 provides an overview of the procedures. Two 
paths can be traced through the diagram, one for the human 
judging, the other for the computer simulation. Note that 
\ the simulation requires a representative set of the judges’ 

\ scores in order to predict the judges' responses on other 

samples from the same population. 

As shown in the computer branch of the diagram, aspects 
of the recorded performances were transferred to paper tape, 
using the equipment described in the previous chapter. The 
frequency meter produced a chart of performance frequency 
versus time, and the envelope follower presented the output 
voltage envelope (signal amplitude versus time). 

Hand processing of the paper charts, described in this 
section, was used to extract the feature values for each 
note. Normalization and conversion of the chart values, and 
the subsequent reduction of the feature values to a small 
set of predictors were accomplished using special programs 
on an IBM 3^0 computer. 

Multiple regression analysis, using these predictors 
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and based on a set of judges' scores produced a prediction 
equation which was then cross-validated on subsets of the 
sample* The cross validation, accomplished by correlat- 
ing the predicted scores with the actual judges' scores 
for the subset is shown schematically at the bottom of 
Figure 1 • 



PEEPORMANCE SELECTION AND PREPARATION 

Fifty audio tapes, consisting of vocal and instrument- 
al performances given at the 1968 Connecticut All-State 
Auditions, were auditioned to find a suitable data base for 
the present investigation. Of these, it was foimd that only 
the female vocal performances had a pitch range that could 
be processed satisfactorily using the available equipment. 

A preliminary study was conducted to determine if a 
stable criterion could be established for this kind of 
short vocal performance, and to test the judging categories, 
instructions and format of the judging form. Reliabilities 
for eleven judges in this preliminary setting ranged from 
.70 to . 89 , and were encouraging enou^ to warrant the 
present investigation. 

With the information available from this preliminary 
study, a set of sixty-two vocal performances by sopranos 
and altos were taken from the original audition tapes, 
and were used to prepare an analysis tape and a judging 
tape* 



Performers auditioning were allowed to choose either 
the aria "If With All Your Hearts" from Mendelssohn’s 
"Elijah", or "He Shall Feed His Flock" from Handel’s 
"The Messiah". The portions of these selections used 
for performance judging are shown in Appendix A. 

Thirteen performers sang the Mendelssohn aria. These 
auditions were divided, so that each phrase starting with 
"If VJith All..." was treated as a separate performance. 
These twenty-six performances were presented on the 
judging tai>e so that the judges could not readily connect 
the two segments sung by one person. The remaining 
thirty-six performers sang the Handel aria, bringing the 
total number of performances to sixty-two. The duration 
of the Mendelssohn performances ranged from seven to 
twenty-two seconds. The duration of the Handel segment 
ranged from thirteen to forty-two seconds. 

The judging tape was made by copying the original 
taped performance at its nominal recording speed, preceded 



Figtirs 1, 

Overview of Procedures 

Vocal Performances on Audio Tape 




"by a performance num'beri and followed “by a ten second silence* 
The running time of the judging tape was approximately 45 
minutes. Seven copies of the original judging tape were 
made I so that each judge could audition the tape at ^is 
own rate and convenience. 

The analysis tape was made hy copying the original 
tapes on a variable speed recorder (Sony, Model TC-56OO). 

By this means, the frequency range of the original perform- 
ance was shifted to coincide with the range of the PM 
Subceirrier Discriminator* 



CRITERION SCORES 



Selection of Judges 

The judges chosen for the panel were required to he 
experienced music teachers, some at the high school and some 
at the college . level. Nine people fitting this description 
were contacted. Of the nine, seven were able to complete 
the judging task} of the other two, one was too busy, and 
the other found the task too difficult to do to his satis- 
faction. All the judges found the assignment to be an 
arduous one, because of the narrow range of abilities re- 
presented in the performance sample. 

Directions to the Judges 

The performance judges were asked for scores on a five 
point scale for intonation, vibrato, rhythm, dynamics, and 
an "overall” rating. The average of the first four categories 
was also calculated and used as an 8idditional criterion 
value. The instructions given to the judges and the judg- 
ing form are reproduced in Appendix A* The judging form 
contains five columns, one for each category. Each column 
was subdivided with headings "A" through "E", with "A" 
representing the best performances* The judge had only to 
check the letter category for each performance under each 
of five columns. Ntunerical results were card punched from 
these sheets with As1, Be 2 , etc. Judges who completed 
the task were paid a fee of twenty dollars. 

\ 

Each judge played\the tape through at least twice. 
Several judges commented that the dynamics and rhythm 
categories were the most difficult to grade, and suggested 
that a diagnostic grading scheme mi^t make their job easier. 

Calculation of Criterion Scores 

The grades assigned by the judges are essentially ordin- 
al* lilhen numerical values are substituted for the letter 
grades, and then averaged to provide a criterion score, an 
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interval scale is implied. The tacit assumpticn is made that 
the "distance” between the "A" and "B" categories is the same 
as the "distance" between all other adjacent categories | and 
that, for example, an average score of 1.5 is "halfway*' 
between "A" and "B". 



In addition, there is an assumption about the assign- 
ment of categories by the judge vdien the composite score 
used is the arithmetic mean: the scores must be considered 

as measurements which differ in a random fashion from a 
hypothetical "true" score, which represents the performance. 

It is only under this assvimption that the average is a meaning- 
ful estimate of the performance score. If, for example, two 
schools of thought were represented in the panel of judges, 
with systematic differences between them, no single number 
would be a reasonable representation of the judges' opinion. 



Stability of the Criterion Scores 

The stability of the criterion scores can be estimated 
from the inter judge correlations, if the assumptions made 
in the previous section hold. The inter- judge correlations 
and a reliability estimate were calculated for the seven 
faculty judges. In addition, the correlation of each judges’ 
scores with the average of the other six judges’ scores was 
calculated as an indication of his agreement with the group 
opinion. 

In order to test the reliability estimate, a class of 
nine graduate music students was asked to audition the same 
audio tape heard by the seven faculty judges. The average 
of the graduate students' scores was compared to the average 
of the faculty scores by calculating a correlation for 
each of the grading categories. 

Three of the faculty judges consented to regrade the 
sample on a "one pass" basis, so that self-correlations 
could be compared with the inter- judge and inter-group 
correlations. 

Considerable inter— category correlation was expected, 
based on the results obtained by Page and Paulus (1968). 

The stability of the less clearly defined categories was 
expected to benefit from this "halo" effect. 

PREDICTOR VARIABLES 



Selection of Features 

The computer simulation of a criterion score is a 
problem in pattern recognition. The features of the pattern 
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upon vAiich recognition is to be based are crucial to the suc- 
cess of a simulation. The selection! this casei of the 
physical variables to be used as features depends on an assump- 
tion of a correlational relationship between the physical 
variables and the subjective responses of the judges. 

The characteristic features chosen are listed in Table 1 •- 
These six values were calculated from measurements taken on each 
tone of the performances. It was expected that| while 
considerable intercorrelation would occuri the pitch features 
would be the primary intonation predictors, the power level 
would be the primary dynamics predictor, and that the dura^ 
tion features would be the primary rhythm feature. Because 
of the tedious hand processing necessary, no features were 
extracted specifically for vibrato. 



Table 1 

Characteristic Featxires, Calculated for Each Tone 



1 . 


Initial Pitch 


(cents) 


2 . 


Middle Pitch 


(cents) 


3 . 


Final Pitch 


(cents) 


4 . 


Sound Power Level 


(decibels) 


5 . 


Duration of Tone 


(seconds) 


6 . 


Duration of Break or 






Glissando between Tones 


(seconds) 



Feature Extraction 

Delays in equipment expected for machine processing 
made it necessary to employ hand processing techniques to 
extract the feature values from the recordings. 

The paper tapes produced by the frequency meter and 
the rectifier were first segmented to define the onset and 
release of each tone sung. Tonal and non-tonal segments 
could then be measured along the time axis. The length of 
these segments represented the duration of a particular note 
sung by the performer, and the break or glissando leading 
to the next note. Because of noise and the decay character- 
istics of the system, no distinction was made between 
glissandi and breaks. Machine segmenting, while not a trivial 
problem, will undoubtably provide more consistent results 
with much greater flexibility. 



- 46 - 



O’ 



3 



Figures 2 and 3 show excerpts from two performances as 
represented on the paper tapes, with vertical lines indicat- 
ing the segmenting. 

After segmenting, average pitch lines were drawn for 
each tonal segment by estimating the center of the vibrato 
envelope. Three ordinate values were tabulated from each 
tonal segment, at the beginning, middle, and end of the 
average pitch line. The maximum amplitude value for each 
tonal segment was also tabulated. These values are also 
indicated in the diagrams. Data relating directly to vibrato 
characteristics was not used. Because of the large variations 
in vibrato within single performances, this data was left for 
machine reduction at a later time. 

The tabulation of these data resulted in an N by 6 data 
matrix for each of the sixty-two performances, where N is the 
number of tones scored for eewjh performance. The initial 
frequency of the original taped performance was used to 
calculate the change in tape speed, and to establish the 
original frequency of all the performance tones, since they 
would be affected in the same proportion by the slowing or 
speeding of the audio tape. The tabulated data matrix for 
each performance was transferred to punched cards, which 
were then proo'f-read to assure their accuracy. 



Data Formalization 

Since the speed of the original recorded performance had 
been altered to fit the frequency window of the conversion 
equipment, the original frequencies and durations were 
recovered from the data matrix using calibration factors 
determined during the processing of the auditory signal. 

The original frequencies were then normalized to a cents 
scale, referenced to the starting pitch chosen by the singer. 
This m6uie it possible to compare pitch deviations on a scale 
which is independent of starting frequency. A pitch change 
of one cent is defined as the change from fo to f when 
fssfo X 2 VI 2 OO, This frequency change represents a pitch 
change of 1 / 1 OO of a semi— tone of the tempered scale. Visual 
estimation of chart lines led to confidence limits of 
approximately - 4 cents on each reading, a limitation which 
will be eliminated when machine readout is available. The 
sound level chart readings (output volts) were converted 
to the decibel scale, referenced to the minimum signal prod- 
uced by the singer. Tonal and non-tonal durations were 
converted to proportions of the performance duration, so 
that differences in tempo would not affect the predictors, 
but only relative differences in tone durations. These 
normalizations did not, of course, provide any data reduction. 
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but only served to facilitate the next step, whioh was to 
oompare each of the performanoes to a standard performanoe 



Data Reduotion and Caloulation of Prediotors 

The approaoh taken for data reduotion was to establish 
a set of standard values for eaoh tone of the musioal seleo- 
tion. Deviations from the standard were then oaloulated for 
eaoh performanoe, and the average deviations and mean square 
deviations from the standard were determined for each of the 
six features. These twelve numbers were the basio prediotor 
set used in the multiple regression equation. A thirteenth 
predictor, the ratio of total tonal duration to total non- 
tonal duration, weis added when it was found to have low 
oorrelation with the most effeotive of the other twelve 
predictors. 

The data reduotion program was run twice, using two 
different sets of standard values. The first set was from an 
operational standard: the best performanoe in each seleotion 

oategory (Mendelssohn 1, Mendelssohn 2, and Handel). The 
seoond set of standard values was obtained by using a literal 
transformation of the notation of the musioal soore: the three 

'^i'bch values for each tone were identical, having the value, 
in cents, appropriate to the score indication for that tone; the 
duration values were proportional to the note durations in the 
printed score; non-tonal durations were zero, except where rests 
were indicated. 

Since no score indications are given, sound power level 
was established by rule for the literal standard. Appropriate 
sound power levels appear to be, to a first approximation, pitch 
dependent. Therefore, decibel levels for the literal standard 
were made proportional to the score value for pitch referenced 
to the lowest tone in the sequence. 

The primary differences between the operational and the 
literal standards can be listed as follows: 

1 . Pitch - The initial and final pitches in the operational 
standard vary considerably from the nominal pitch of the literal 
standard, especially on transition tones. 

2. Duration - Tonal durations deviate from the nominal 
values, and short breaks appear between almost all tones of 

the operational standard. The duration ratio, since it is not . 
referenced to a standard, is the same for both sets of predictors. 

The programs used for data normalization and reduction are 
listed in Appendix B. 
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ANALYSIS 



Multiple Regression 

A step-wise multiple regression analysis was run for both 
sets of prediotors with the average of the seven faoulty judges 
scores as the oriterion. Eaoh set of prediotors was tested 
on the oomplete set of 62 performanoes, as well as on two 
partitions of the total set; a random split of 31 performances 
each, designated "31 A” and "31B", and the natural subsets 
delineated by the Handel and Mendelssohn seleotions, designated 
respeotively "36" and " 26 " in the tables, indicating the 
number of performanoes in eaoh subset. 

The use of all thirteen prediotors in the linear regres- 
sion equation for any subset inoreases the error varianoe 
acoounted for, and therefore deoreases the generalizability 
of the resulting predictor weights. In order to maintain a 
more nearly constant performanoe-predictor ratio, a seleotion 
of six prediotors was made, and a new multiple regression 
equation using only these six prediotors was oaloulated for 
each subset. The predictors were chosen by ranking, for each 
judging category, the predictors in the order in which they 
contributed to the reduction of the sum of squares. The 
six finally used were selected from those of the original 
thirteen which appeared most often in the upper half of the 
ranking for each of the six judging categories. The predictors 
used for the subsets are listed in Table 2. 

There is no guarantee that these are the best combinations 
of predictors to select, since any one of them may be reducing 
primarily error variance. In that case, there would be 
little generalizability to other samples. However, if a set 
of predictors and b-weights, generated from one subset of the 
sample, satisfactorily predicts the scores from another 
subset, then these predictors may be expected to work on another 
sample from the same population. 



Table 2 

Predictors Used For Subset 
Operational Standard 

1. Duration Ratio 

2. Tonal Duration 

3. Non— tonal Duration 

4. Non-tonal Duration, Sq. 

5. Middle Pitch 

6. Final Pitch, Sq. 



Calculations 

Literal Standard 

1. Duration Ratio 

2. Non-tonal Duration 

3» Non-tonal Duration, Sq. 

4. Middle Pitch 

5. Pinal Pitch 

6. Sotmd Power, Sq. 
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Cross Validation 

Because of the small sample size (62 performances) cross- 
validation on any subsets of the original sample using all 
thirteen predictors would give a very poor indication of the 
generalizability of the regression coefficients. Therefore 
only the six most "generally useful" predictors from each 
predictor set, listed in Table 2, were used in the cross- 
validation. A step-wise multiple regression analysis was 
used to determine the b-waights for one subset of the perform- 
ance sample. These linear coefficients were then used to 
calculate estimates tf the judges' scores for the remaining 
performances in the sample. 



SUMMARY 

The procedures indicated in this section have been based 
on the program followed by Page and Paulus as presented in The 
Analysis of Essays by Computer (1968). 

After selecting a suitable group of audio-taped vocal 
performances, a panel of judges was given the task of grading 
them in five different categories. The judges' scores were 
averaged to provide a criterion score. Acoustic features were 
extracted from the audio-tape, and this data was normalized, 
and reduced to thirteen predictor variables for each performance, 
using an IBM 360 digital computer. The predictors were used in 
a multiple regression analysis to simulate the criterion score. 
Cross-validation was accomplished by partitioning the sample, and 
predicting the scores for one subset using the regression 
coefficients from the other subset. 

The major departure from the procedures followed by Page and 
Paulus is found in the feature extraction and data reduction 
procedures. The essay, composed of symbols compatible with 
machine input, does not require the extensive pre-processing 
necessary to translate aspects of the acoustic signal into 
a machine readable format. 



RESULTS 

ANALYSIS OF THE CRITERION VARIABLE 
Inter- judge Correlations 

A comparison was made between each judge and his peers for 
each grading category. The results of this comparison, the 
inter— judge correlations, are presented in Table 3. 

If the average of the judges' scores is assumed to be the 
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Inter-judgo Correlations 
Seven Facully Judges 
U =^2 



Intonation 



Judge 


1 


2 


2 


4 


2 


6 


7 


1 


1.00 


M 


.37 


.54 


.24 


.57 


.47 


2 




1.00 


.50 


.62 


.35 


.37 


.55 


3 






1.00 


.53 


.31 


.59 


.49 


4 








1.00 


.42 


M 


.34 


5 










1.00 


.42 


.31 


6 












1.00 


.42 


7 










. 




1.00 









Vibrato 










Judge 


1 


2 


2 


4 


2 


6 


Z 


1 


1.00 


.39 


.03 


.53 


.40 


JiQ 


.39 


2 




1.00 


.19 


.50 


.31 


.47 


.28 


3 


• 




1.00 


-.05 


.04 


-.13 


-.10 


4 








1.00 


.29 


.64 


.33 


5 










1.00 


.46 


.36 


6 












1.00 


.46 


7 






. 








1.00 
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Judge 


1 


2 


2 


4 


2 


6 


Z 


1 


l.W 


.5l 


.W 


.24 


.21 


.39 


.28 


2 




1.00. 


.35 


.19 


.21 


.22 


.27 


3 






1.00 


.29 


.07 


.37 


.23 


4 • 




, 




1.00 


.48 


.35 


.16 


5 










l.CO 


.20 


.22 


6 












1.00 


.10 


7 














1.00 
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Inter- judge Correlations 
Seven Faculty Judges 
N«62 



Dynamics 



Judge 1 


2 


1 


4 


5 


6 


1 


1 1.00 


.38 


.19 


.17 


.28 


.45 


.50 


Z 


1.00 


-.17 


.24 


.52 


.18 


.39 


3 




1.00 


.00 


.02 


.19 


.18 


4 






1.00 


.14 


.37 


.18 


5 








1.00 


.13 


.32 


6 










1.00 


.21 


7 












1.00 



Overall 



Judge 1 


2 




4 


5 


6 


7 


1 1.00 


.57 


.29 


.^8 


.42 


.58 


.47 


2 


1.00 


.42 


.48 


.43 


.55 


.42 


3 




1.00 


.34 


.20 


.42 


.31 


4 






1.00 


.30 


.60 


.40 


5 








1.00 


.34 


.41 


6 










1.00 


.55 


7 












1.00 



Avg. of 1-^ 



Judge 1 


2 


2. 


4 


1 


6 


Z 


1 1.00 


.54 


.36 


.53 


.36 


.^1 


.49 


2 


1.00 


.32 


.56 


.49 


.48 


.43 


3 




1.00 


.33 


.25 


.47 


.31 


4 






1.00 


.46 


.69 


.36 


5 








1.00 


.56 


.40 


6 










1.00 


.48 


7 












1.00 
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best estimate of the correct grade for the performance, then 
the correlation of each judge with the average of all the 
other judges can be used as an indication of judging accuracy. 
These correlations are shown in Table 4. By eliminating each 
judge from the average to which he is compared, the problem of 
spurious correlation (Benson, 19^5) is avoided. 

On the basis of the lowest correlation with the average 
for fbur out of the six categories. Judge #3 may be considered 
the maverick of the group. The validity of this designation 
was reinforced when the same judge regraded the sample as a 
member of the student judging group, and had the lowest 
correlation with the average of this group in five of the six 
categories. 

A summary of the data presented in Table 3 can be made 
by calculating a typical inter- judge correlation for each 
category. This calculation is hased on Rozeboom (I 966 , 
p, 320 ): the "homogeneity” of the judges is defined as 

Homoeeneitv - Average Proper Covariance betwe en Judges . 

® ^ ~ Average Variance 

A reliability value can be calculated using an equation which 
is analogous to the Spearman-Br«wn "prophesy" formula, but 
which is based on the homogeneity. This reliabilitj' 
estimate, designated "alpha", is given by Rozeboom (19^6, 

p, 412 ) as: Alpha = n (Homogeneity) , where "n" is, 

1 + (n-1 ) Homogeneity 

in this case, the number of judges participating. 

Table 5 shows the homogeneity and alpha values for each 
judging category for both the seven judge faculty group and 
the nine judge student group. Since alpha is a prediction 
of the inter-group correlatien, based on the assumption of 
uncorrelated measuremen'| error (deviations from the average), 
a correlation of the group averages will provide a test of the 
assumption. The inter-group correlations are given in the last 
column of Table 5» 

The inter-group correlations shown are in the same range 
as the predicted reliability of the group avereige, justifying 
tentative acceptance of the assumption of uncorrelated 
deviations. The pattern across judging category is also 
very consistant: there is considerably more inter- judge 

agreement with regard to the "Intonation" and "Overall" 
categories than there is for the "Vibrato", "Rhythm" and 
"I)ynamics" categories, 

Intra- judge Correlations 

A comparison of intra and inter- jxidge consistancy is of 
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Table 4 



CozTolation of Each Judge 
vith Avexago of Other Judges* Scores 



Judging 

Category 


1 


2 




4 


1 


6 


2 


1. Intonation 


.59 


.67 


.63 


.68 


.44 


.66 


.59 


2. Vibrato 


.58 


.56 


.00 


.57 


.48 


.63 


.43 


3* Rhythm 


.55 


.44 


.47 


.46 


.36 


.42 


.34 


4. Dynamics 


.55 


.53 


.09 


.28 


.43 


.40 


.53 


5* Overall 


.70 


.66 


.43 


.63 


.47 


.71 


.58 


6. Avg. of 1-4 


.65 


.65 


.43 


.65 


.55 


.77 

... , • V 


.55 








Table 5 




f 







Sunmaxy of Inter- Judge Correlations f 
Inter-group Correlations 
(Seven Facultyt Nine Student Judges) 



•fudging 


Homogeneity 


Reliability 


Inter-group 


Category 






ML 




Correlations 


1. 


Intonation 


.43 


.40 


.64 


.86 


.84 


2. 


Vibrato 


.28 


.29 


.73 


.79 


.86 


3. 


Rhythm 


.27 


.30 


.72 


.80 


.78 


4. 


Dynamics 


.24 


.24 


.69 


.74 


.73 


5. 


Overall 


.41 


.39 


.83 


.85 


.86 


6. 


Avg, of 1-4 


.41 


.43 


.83 


.87 


.88 
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interest to check the possibility of systematic differences 
between judges. If the judge's self-consistancy is signific- 
aiitly greater than that between judges, some modification 
of the random error assumption may be necessary. 

The three judges who regraded the performance set were 
Judges #1 , #3 and #4* Particular interest was attached to the 
self-consistancy of Judge #3» who had the lowest correlation 
with the group average as a member of both the faculty and 
student groups. The regrade correlations, by category, are 
given in Table 6. 

The regrade correlations are higher than most of the inter- 
judge correlations, with most values falling within or slightly 
above the range for correlations of each judge with the average 
of the other judges' scores. In particular, the high consist- 
ancy shown by Judge #3 indicates a stable grading procedure 
based on standards different than those used by thac oa jority 
of the judges. 



Inter-Category Correlations 

There are two possible sources of inter-category dependency. 
First, training and experience may tend to improve all aspects 
of performance, so that the performer who sings with the proper 
intonation will be more likely to have an acceptable vibrato 
than one who sings with poor pitch control. Second, it may 
be tacitly assumed by a judge that the preceding is true, 
in which case his judgement of one category may bias his grade 
in another category (called a "halo" effect.) 

The between-category correlations of the average soores 
are shown in Table 7 for the five categories scored by the 
judges. Correlations between the first four categories and 
the "Avg. of 1-4" category would contain spurious components, 
and are not shown. However, the correlation between the 
"Overall" and "Avg. of 1-4" categories was . 96 , indicating 
a very high predictability for the "Overall" grade on the 
basis of scores in the four other categories. 

"Intonation" is the most distinct of the categories, 
having the lowest inter-category correlations. It is also 
unique in having as high a reliability as the "Overall" 
category. 



Summary of Criterion Results 

An analysis of the faculty and student judges' scores for 
the sixty-two performances has shown that the average grade 
for each performance is sufficiently reliable in each judging 
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Table 6 

Ilegzade Correlations for Three Judges 


Judging 


Judge 


Judge 


Judge 


CateKOiy 




J!2- 


^4 


1. Intonation 


.67 


.58 


.58 


2. Vibrato 


.60 


.66 


.45 


3. Rhythn 


.50 


.66 


.26 


4. Qynamlce 


.55 


.57 


.31 


5. Overall 


.59 


.69 


.43 


6. Avg. of 1«>4 


.69 


.74 

% 


.56 


•Habie 7 

Intexcorrelations of Fozfoxnaneo 


Judging Categories 


Judging 

Categoxyt 


-I- -2. 


JL 


4 


1. Intonation 


1.00 .66 


.63 


*68 


2. Vibrato 


1.00 


.77 


.82 


3* Bhythn 




1.00 


.84 


4. Pynanloa 




] 


L.OO 



5* Overall 



.86 

.84 

.84 

.66 

1.00 
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category to serve as a stable criterion for the multiple 
regression analysis. The most stable categories are 
"Intonation", "Overall" and "Avg, of 1-4" with predicted 
reliabilities for the faculty judges ranging from .83 to 
.84. The reliabilities of the "Vibrato" "Rhythm" and 
"Dynamics" categories are considerably lower, ranging from 
.69 to . 73 . Comparison of group averages for the faculty 
and student judges substantiates the reliability prediction, 
except for the "Vibrato" category. The two groups have a 
greater level of agreement on the "Vibrato" scores than 
would be predicted from the inter- judge correlations. The 
very low correlations of Judge #3 with the other judges in 
both groups is the primary source of this disagreement, so 
that the inter-group value for "Vibrato" (.86) is considered 
to be a valid measure. 



ANALYSIS OF IHE PREDICTOR VARIABLES 

As indicated in "Procedures", the predictor variables were 
based on deviations in the performance features from a set of 
standard values. Two different standards were used: Predictor 

Set #1 was derived from the best performance ("Operational 
Standard"); Predictor -Set #2 was based- on *a literal inter- 
pretation of the musical score ("Literal Standard"). The 
correlations between these predictors and the criterion scores 
are presented in Tables 8 and 9» The predictors are grouped 
according to category: duration, #1-5; pitch, #6-11; sound 

power, #12 and #13. All the predictors in Set #2 differ from 
those in Set #1> except for the duration ratio, since its 
definition is not relative to any standard. 



MULTIPLE REGRESSION ANALYSIS 



Simulation of the Judges Response 

The results of the computer simulation of the human judge- 
ments are presented in Table 10, along with some associated 
statistics. In the first column, the reliability estimate 
for the average judges' score is given for each category. 

The multiple regression coefficients found using the thirteen 
predictors from Sets #1 and #2 on the sample of 62 performances 
are tabulated in column two. Except for the "Intonation" 
category for Predictor Set #2, all of these values are statist- 
ically significant beyond the 5^ level, determined using the 
P-test. Two categories in Set ?f 1 , "Vibrato" and "Avg. of 
1-4" I are significant beyond the IJio level. 

Column three, the "shrunken" multiple regression 
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coefficient! is an estimate of the correlation between 
predicted and the averaged judges* scores which may be 
expected for a new sample of the same size from the same 
population! when the predicted scores are calculated 
using the predictor weightings (b-weights) found for the 
original sample. This validity estimate will always be 
lower than the value found for the first sample! since 
it is expected that some of the variance accounted for will 
not correlate in the new sample. For a fixed number of 
samples! the amount of variance ewsoounted for by the multiple 
regression coefficient increases as the number of predictors 
increeises. The equation used to calculate the expected 
shrinkage is called the lifherry Formula (Kelly! 1947i P» 474) 
and has the form: _ „ 

R = (K.1 ) - n 

® N - n - 1 ’ 

where ”R ” is the shrunken multiple regression coefficient! 
*'R*'is thi coefficient found for the sample! **N" is the 
number of subjects in the sample! and ”n*' is the number of 
predictors. 

For comparison purposes! it is convenient to have a 
coefficient that has been normalized over varying criterion 
reliabilities! so that it represents the predictability 
relative to a perfectly reliable criterion. A multiple 
regression coefficient calculated from the shrunken coefficient 
and corrected for attenuation due to criterion unreliability 
is presented in column four. This value is calculated by 
dividing the shrunken multiple regression coefficient by the 
square root of the reliability of the criterion variable 
(Kelley! 1947i P» 412). From this value! an expected multiple 
regression coefficient can be calculated for a new sample! 
where criterion reliability is known! by multiplying by the 
square root of the new reliability. 

In order to determine the uniformity of the sample under 
multiple regression analysis! the scores for each of the 
subsets used in the crossvalidation were calculated! using 
the b-weights figured for All 62 performances. These were 
then correlated with the criterion scores. The results are 
shown in Table 11. This was of particular interest because 
of the non-random subdivision of the performance selections 
(lab led ”36" and " 26 "). A clear difference between predictor 
sets 1 and 2 can be seen here. VJhereas for both sets of 
predictors "31B" is somewhat higher than "31 A" for most 
categories! ^ reversal occurs for the "36" and " 26 " 
subsets when Set 2 predictors are used. 



Cross Validation 

Two different partitions have been used in the cross- 
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validation calculations. The first partition was randomly 
chosen to provide two subsets of 31 performances each. 

The second partition divided the sample into the 3^ 

Handel selections and the 26 Mendelssohn selections. Tables 
12 and 13 present the results of these partitions and the 
subsequent calculations. The two tables are- identical in 
format . 

In the first column of each three column block of Tables 
12 and 13i the multiple regression coefficient for the subset 
is presented. This coefficient is balculatedi using the six 
predictors listed in Table 2| by applying step-wise multiple 
regression to the subset indicated at the top of the column. 

The b-weights (regression coefficients) generated in this 
step are then applied to the remaining performances in the 
partition, and are used to predict the judges* scores on this 
subset, which was not involved in the multiple regression 
analysis. The results of correlating the predicted scores with 
the judges’ averaged scores are presented in column three, 
for the subset indicated at the top of the column. 

The significance levels for column one (multiple 
regression coefficients) were calculated using the F-test, 
with degrees of freedom appropriate for six predictors and 
K=31 1 36 or 26 performances. Significance indications for 
column three were based on a one-tailed t-test, which places 
the levels of rejection for the null hypothesis lower than 
those used in column one, appropriate for a correlation between 
two independently generated sets of scores. 

The second column, lab led "R ", presents the shrunken 
coefficient calculated from the values given in the first 
column. This is an estimate, based on the VJherry formula, 
of the success of prediction in a new sample from the same 
population, and is used here as a guide in evaluating the 
cross-validation results, h'hen column three equals or 
exceeds the value in column two, reasonable confidence can 
be placed in the generalizability of the coefficients of 
the regression equation. This occTirs for Predictor Set 
One in all categories except "Intonation". There is not, 
however, complete reciprocity between the performance halves. 
The b-weights generated for "31B" do not generalize as well 
as those generated for "31A" when the predictors from Set 1 
are used. The reverse is true for the Set 2 predictors. 

The "Intonation" predictions are uniformly intransigent, 
indicating that the information concerning the relevent 
intonation characteristics was lost in either the feature 
extraction or subsequent reduction. 

The natTiral subsets, as shown in Table 13i provide use- 
ful levels of generalizability in only one case: the 
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Table 12 



Cioss-Valldatlon 
of iUvndon Subsets 



Predictor Sot Operational Standard* 6 Predictors 



Judftlng 




31A 




b-wts 




31B 




Cateco_gr_ 


m 


&U 


m 


Perf. 






M 


1, Intonation 


.58 


.41 


.33* 




.66 


.54 


.19 


2* Vibrato 


.75# 


.68 


,56** 




•62 


448 


•54** 


3* Rhythm 


.68 


.58 


.58** 




.54 


.33 


.59** 


4, Dynamics 


•60 


.^♦5 


.60** 




.67 


.56 


.50** 


5* Overall 


.57 


.40 


. 62 ** 




.69 


.59 


.41* 


6. Avg. of 1-4 


.68 


.57 


.62** 




.67 


.56 


.53** 


Predictor Set V2t 


Litoral Standardf 6 


Predictors 








Judging 




31A 




b-wts 




31B 




CatcKOty 


23A 






Perf. 




Bsl 


M 


I. Intonation 


.66 


.5/+ 


.12 




•62 


.48 


.23 


2* Vibrato 


.66 


.5^» 


.43** 






.66 


.53** 


3. Rhythm 


.69. 


.59 


.42** 




•68 


.57 


.64** 


4, Dynamics 


.61 


.47 


.50** 




.77V 


.70 


.54** 


5* Ovoxall 


.57 


.39 


.43** 




.74# 


.67 


.52** 


6. Avg, of 1-4 


.67 


.56 


.41* 




.73# 


.64 


.62** 



Notei Vindicates p^*05 (P<>test) 

^Indicates p<»05 it-test) 

♦•Indicates p<*01 (t-test) 
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application of the b— weights generated from the Mendelssohn 
performances ("26") to the Handel performances ("36"). 

Again I intonation remains unpredictable in the new sample. 

The lack of reciprocity in each of these cases is 
due solely to the accidental correlation! and the inability 
of the multiple regression analysis to distinguish between 
accidental and generalizable regularities in the data. 



SUMMARY 

The results reported in this section, based primarily 
on correlational analysis, have shown the level of inter- judge 
and inter-group agreement to be sufficiently stable to 
allow the averaged scores to be used as a reasonable criterion 
variable. 

Thirteen predictor variables, based on the acoustic 
features extracted from the performance, separately 
correlate with the criterion scores with a maximum value of 
. 41 . Used in combination in an optimised multiple regression 
equation to simulate the judges* averaged response, the 
maximum correlation (occurring for the "Vibrato" category) is 
increased to . 76 . 

Cross-validation checks indicate that a generalizable 
set of coefficients is possible for all categories but 
"Intonation** when these predictors are used. The best cross- 
validation results approximate the highest judges* correlations 
with the averages of other judges* scores. 
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Crosa-Val Idiatlon 
oi' Natural Subsets 
(Handel and Kend<?lssohn) 



Predictor Bet 41. t 


Operational Standard* 


6 Predictors 






Judging 




_36 




b-Hts 




26 




Gatogoxy 


2^ 


-Ss 




Perf. 


2^ 


s& 


2^ 


1 • Intonation 


.53 


.36 


.15 




.66 


.51 


VTt 

CM 

f 


2« Vibrato 


.754' 


.69 


•40* 




.53 


.24 


.62** 


3* Rhyttun 


.yi# 


.6<> 


.15 




.58 


.35 


.52** 


4. Dynamlos 


.684' 


.,59 


.27 




.60 


.40 


.49** 


5* Overall 


.63 


.53 


.20 




.62 


.^14 


•k?** 


6, Avg, of 1-4 


.724' 


.65 


.29 




.64 


.47 


.53** 


Predictor Set 4^21 


Literal Standard* 6 Predictors 








Judging 




_J6 


• 


b-wts 




26 




Cate^ry 


-2^ 






Perf. 


M 




2^ 


!• Intonation 


.50 


.31 


.27 




.58 


.36 


.13 


2, Vibrato 


.704' 


.62 


.20 




.64 


.47 


.15 


3. Rhythm 


.664' 


.56 


.23 






.63 


.41** 


4, Dynamics 


.62 


.51 


.46** 




.77# 


.68 


.28* 


5* Overall 


.54 


.39 


.14 




.69 


.56 


.19 


6, Avg. of 1-4 


.65 


.55 


.34* 




.71 


.58 


.30^ 



Note- 1 i/indlcatea .05 (F*teot) 

♦indicates p<»05 tt-tont) 
♦♦indicates p<.01 (i-test) 



DISCUSSION AND CONCLUSIONS 



CRI'EERION SCORES 

The performances chosen for this experiment were selected 
from a real ajudication situation. The performers were high- 
school girls who were considered by their teachers to be the 
most capable singers to send to the All-State competition. 

The range of the performance abilities represented in the 
sample was considered by the judges to be quite narrow. 

Under these circumstances y one mi^t expect widely divergent 
responses from judging panels. 

The reliabilities and inter-group correlations found 
(Table 5) show thaty while reliabilities for single judges 
are lowy a stable criterion can be established by averaging 
the scores of a number of judges. The highest seven judge 
reliabilities y in the range of 0,83y were found for the 
’’Intonation” and ’’Overall” categories. Strong confirmation 
of the validity of these reliability values is provided by 
the inter-group correlations y which equal or exceed the 
predicted values in almost every case. 

It should be noted that the judges did not discuss the 
meanings of the categoryy names or otherwise attempt to 
standardisse their responses, Fresumablyy any such procedure 
would improve the inter- judge agreement, A wider range of 
performance abilities represented in the sample should 
also increase the homogeneity of the judges* responses y as 
well as providing for easier predictor selection and testing. 



STANDARDS FOR EVALUATING COMPUTER PERFORMANCE 

Three levels of agreement on performance scores are 
represented in the data for the criterion variable. At the 
lowest level are the inter- judge correlations. The middle 
values are found for the regrade correlations y and the 
correlations of each judge with the average of the other 
judges. The highest values are the reliabilities and inter- 
group correlations. Representative values for both hi^ and 
low judging categories within each level are given in Table 14* 

The three levels are presented here to provide bench 
marks in evaluating the simulation results. Previous studies - 
have compared the computer results to the inter- judge correlations y 
which are at the low end of the agreement level scale. It 
seems quite reasonable y howevery to expect simulation results 
to exceed the middle range valuesy and to begin to approach 
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Table 14 



Criterion Correlation Ranges 



Judging 




Aoneement Level 




Catet?:oiy 


Low 


Middle 


Hl£i> 


High* 


.42 


.63 


.84 


Loh*’*^ 


.26 


.47 


.71 



* The high categories arn Intonationt Ovezallf Avg* of 1>4* 

** The low categories are Vibrato^ Rhythm^ Dynamics. In several oasesy 
Vibrato moves into the lilgh mn,'9:9. 



Table 15 

Homogeneity and Reliability for l6 Judges 



Judging 

Category 


HomoRenolty 


Reliability 


1. 


Intonation 


.41 


.92 


2. 


Vibrato 


.30 


.87 


3. 


Rhythm 


.26 


.86 


4, 


Dynamics 


.24 


CO 

• 


5. 


Overall 


.40 


.91 


6. 


Avg. of 1-4 


.42 


cst 

c^ 

• 



i 

I 

I 

i 
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the values of the inter-group correlations. The computer | 
in effect I has information none of the hiunan judges has: 
a representative set of average scores. If it is to be 
compared with the human judge • the simulation must not only ' 
correlate on the lowest level with each of the human judges | 
but must have a correlation with the judges* average scores 
which approximates the middle level. 

For the purpose of assigning grades in a school situationi 
correlations of 0.80 or greater between the predicted and the 
judges* average scores would be desirable. Since prediction 
accuracy catmot be expected to exceed the criterion reliabilityi 
somewhat higher reliabilities are required. With a homogeneity 
value of 0.40 I fourteen judges are required to exceed a 
reliability of 0.90. For example • with the combined scores of 
the student and faculty panels (16 judges )• the homogeneity 
and the resulting reliability for the average score in each 
category is shown in Table ^5« 



PREDICTOR SIMULATION OF CRITERION SCORES 
Multiple Regression Results 

The linear regression approximation to the criterion has 
been shown to be moderately successful using the predictors 
based on the operational standard (Set l). The shrunken 
multiple regression coefficients for these predictors are all 
within the middle range specified in Table 14* Two anomalies 
should be noted for these values | however. The value for 
** Intonation” is at the typical value for the lower group of 
categoriesi whereasi according to the reliability groupingi 
it should be at 0.63 or above. The vibrato value • at 0.68 • 
exceeds its expectations asa member of the three categories 
with lower reliabilities. This is particularly notable 
since there are no predictors which are nominally "Vibrato” 
predictors, and there are three nominal "Intonation" predictors. 

The literal standard (Set 2 Predictors) proved to be 
less effective, overall, than the operatonal standard. Losses 
in four categories far outwei^ed the very slight gain in the 
other two. This may be taken as an indication that v;hatever 
constitutes a musically acceptable performance, it is not the 
literal interpretation of the score, at least not for these 
vocal performances. 



Cross Validation Results 

Again with the exception of the "Intonation" category, 
the cross validation results are in the middle range specified 
in Table 14i for at least one set of predictor wei^ts. 
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The results are not reciprocal: that iS| the b-weights 

generated on subset ”31 A” predict well for subset 
but those generated on ”31B” are in general less effective on 
”31A"« The reverse is true for the Set 2 predictors. 

The difference between the results of Set 1 and Set 2 
predictors is less pronounced in the cross validation than 
it is in the simulation based on the full sample of 62 per- 
formances. This is probably due to the choice of predictors 
used in the reduction from 13 to six for the cross-validation. 

The natural subsets (the Handel and Mendelssohn performances) 
show somewhat lowered cross validation results i and again 
reciprocity is lacking. Howeveri except for ’’Intonation” i 
the best results are in the mid-range indicated in Table 14* 



GENERALIZABILITY 

Because of the preliminary nature of the simulation 
attempted in this experiment | and the improvements in feature 
extraction expected in the near future | it does not appear to 
be useful to specify the particulari normalized regression 
weights found for this sample. In addition it is expected 
that the regression weights will be both instrument and context 
sensitive. Many more experiments will be needed to test the 
range of commonalities over various musical sources and forms. 

The methodology, however, as it is used to extract features, 
compare them to a standard and then reduce them to predictors, 
is expected to generalize to most musical sources and forms. 

In particulari because this test was conducted with a random 
selection of performances from a real adjudication situation, 
and because of the high level of intergroup agreement on the 
criterion scores, it is believed- that the levels of significance 
achieved here can be equalled and improved upon for other 
similar samples in many musical performance areas. However, 
considerably hi^er levels of prediction than those represented 
here will be required before practical applications can be 
pursued. 



FUTURE RESEARCH 

Many branches for investigation appeared during the 
course of this project vdiich could not be included. Some of 
these will be pursued under a grant provided by the United 
States Department of Health, Education and Welfare. The 
following items were considered to be of particular importance: 



1. In place of the multiple regression approach, the 
methods of the ’’adaptive set” using maximum likelihood 
approximations (Sebestyen, 1962) should be tried. This 
approach makes class assignments, rather than predicting a 
scale value, and can take into account disjoint sub-classes. 

2. The performance standard used in this study was 
taken from within the sample set. An ’’ideal” performance 
standard should be tested. That is, a performance which is 
significantly better than any of those in the sample set in 
all respects. 

3. An analysis of variance test for tho uniqueness 
cf the judging categories should be applied, and an attempt 
made to extract the ’’hale” effect. 

4. Automatic feature extraction procedures must be 
developed, s«> that adequate samples may be analyzed in a 
reascnable amount of time. Because of the enormous amounts 
of information to be processed, hand processing at any 
point in the sequence is an intolerable bottleneck. 

In addition to these modifications of the present 
experiment, a number of new, related experiments should be 
started: 

A. Patterns and methods of structuring hiunan judgments 
of musical performance need a great deal of investigation 
before models can be developed that have wide applicability. 
The patterns suggested by the student and faculty judgements 
in this study would have important implications if they were 
found to persist for other groups. Direct responses, such 
as ’’galvanic skin response” (skin resistivity), respiration 
rate, and pulse rate, should be recorded for test groups 

as vjell as the verbal response to sets of performances. 

B. Synthesis, in the form of graded performances, 
based on the results of studies such as this one, should 
be used to determine what elements of the performance can 
be manipulated to produce predictable responses in a panel 
of judges. The goal of such synthesis should be to maintain 
as natural a setting as possible, while changes in only one 
element of the performance (such as pitch or loudness contour) 
are made. 



SUMMARY 

While this investigation has not established conclusively 
the practicality of simulating the pooled responses of human 
judges to short vocal performances, the results are 





encouraging* The average judges’ score can be accepted, 
on the bases of the results reported here, as a stable 
criterion for simulation. The simulation effort produces 
scores which correlate with the criterion (the average of 
the judges* scores) in the same remge (0*47 to O. 63 ) 
as do the human judges* This is considerably higher than 
the inter- judge correlations (0*26 to 0*42) and appears 
particularly encouraging in view of the lack of precedent 
to provide guidance in selecting and calculating the 
acoustic features used for prediction* 

The only exception to the general level of simulation 
and prediction was found in the "Intonation” category* Low 
correlations here are particularly puzzling because of the 
high criterion reliabilities f>r this category* In spite 
of this exception, the results fo\uid in this experiment 
show that operationally defined variables can be found which 
will predict, within limits, the subjective response of a 
group of musically adept listeners* This is a necessary 
(but not a sufficient) condition for the inverse process 
described in the "Statement of Purpose”: the definition 

of acceptability limits for performance in terms of operation- 
ally defined variables* 

Considerable work is still required before this approach 
can be considered a useful tool, either for direct applicar- 
tion to adjudication or for the \mcovering of basic 
correlational relationships between subjective and objective 
acoustic variables* The results of this experiment lend 
strength to the belief that this will be a productive 
approach, and that it deserves serious consideration in 
any research program directed toward understanding musical 
perception and performance in terms of measurable, rather 
than mystical, concepts* 



REFERENCES 



BwiMiii M, "Spurious Correlation in Hydraulics and Hydrology 
Journal of the Hydraulics Division , Proceedings of the 
American Society of ' Civil Engineering , July 1965? pp 
42. 



35- 



Kelley, T, L. Fundamental Statistics . Cambridge: Harvard Uni- 
versity Press, 1947 • 

Page, E. B., and Paulus, D. H. The Analysis of Essays by Com- 
puter . Final Report , HEW Project No. ^-1318. Storrs: 
University of Connecticut, 1968. 

Rozeboom, VJ. W. Foundations of the Theory of Prediction . 
Homewood, 111,: Dorsey Press, 1966. ” 

Sebesbyen, G, S, Decision-Making Processes in Pattern Recog - 
nition . New York: Macmillan, 19657 



- 74 - 





APPENDIX A 



INFORMATION GIVEN TO JUDGES 



A, Instructions to Judges 

Sixty-two student "performances" of fifteen to thirty 
seconds duration each, are presented on tape with a perform- 
ance number preceding and a ten second pause following 
each selection. All the performers are sopranoes or altos. 
Thirty-six sing the first few measures of "He shall feed 
His Flock; from the "Messiah". The remaining twenty-six 
performances consist of the first and second occurences of 
the phrase beginning "If with all yoia:* hearts..." from 
"Elijah", sung by thirteen singers. The halves of these 
thirteen selections have each been given a performance 
number and are presented separately on the listening tape. 
While some voices may be distinctive enough to link the two 
performance halves, they are to be judged independently, 
as far as this is possible. 

The judging form consists of a general evaluation of 
several categories, with ratings from "A", the best, to 
"E", the worst. Because of the number of judgements to be 
made, it may be necessary to listen to the tape more than 
once, or stops between the performances may be preferred, 
with an immediate replay when necessary. Any auditioning 
scheme which produces consistent results can be applied. 

L 

! 

An attempt should be made to distribute the grades so 
that the categories are approximately equal, eg. 15 A*s, 
i 15 B*s, etc. However, wide latitude can be taken with this 

i distribution if the data demands it. More important to 

remember is that "A" is defined here to mean the best of 
five categories present in these performances, and "E", 
the worst in these performances. 

The column headed "Overall" is not intended to be a 
I summary of the other categories. It is recognized that 

: aspects of performance other than those specifically 

I mentioned may greatly modify the overall impression, 

j This judgement should therefore be made independent of the 

I ratings in the other categories, if at all possible. 

I 

In addition, please indicate how you would delineate 
I . the judging categories in order to produce the most useful 

I form for grading or diagnostic purposes. Also include any 

I critical comments on the meohanics of presentation, eg. 

I selections too short, silent spaces too long, etc. 

i 
! 
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^C. Excarpt rx’om Handel’s "Messiah" 
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He - shall gather the Xarribs xith Kb arm, >iith - •• Kls arm. 



D* Excerpt from Mendelssohn *s "Elijah" 
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APPENDIX B 



DATA NORMALIZATION AND REDUCTION 



Six features were measured for each tone of each 
performance! and were stored in a data matrix for that 
performance* The values punched into data cards were the 
chart scale readings for pitch and amplitude! and chart 
lengths in inches for durations* 

The following steps were needed to transform the 
data matrix! and to calculate the predictors from the 
normalized data* Table A~1 shows a sample data matrix 
with the column headings referred to in the description 
of the calculations* 

1 * Determination of Time Factor from Starting Pitch* 
The time factor (speed change) used to fit the performance 
range into the frequency "window” of the PM Discriminator 
(frequency meter) was determined by comparing the original 
starting frequency to the one indicated by the discriminator 
calibration: 

Adj. Start* Preq* ■ (Chart Value) + ! 

where C. and C_ are calibration factors associated with the 
frequency meter. The time factor is simply the ratio of the 
actual and the adjusted starting frequencies: 

Time Factor = Actual Start. Preq*/Adj* Start. Preq. 

Effective Chart _ Actual Chart , 

Speed Speed 

2. Conversion of Duration Data to Seconds* 

With the effective chart speed known! the duration data 
(columns 1 and 2) can be converted from chart distance to 
seconds: 

Duration « Chart Value / Effective Chart Speed 

3i Conversion of Frequency Data to Cents. 

The frequencys (col\imns 3i4 and 3 ) are calculated from 
chart values as in step one* Tape Speed change is accounted 
for by multiplying by "Time Factor". Conversion to cents 
normalizes the pitch values so that all performances of 
the same selection are comparable: 

Preq. a (C^ (Chart Value) + Cg) * Time Factor 

Pitch a Start. Pitch + (1200 ♦ Logg (P/P^))* 

P^ is the frequency of the starting tone. 

4* Conversion of Sound Power Data to Decibels* 

The chart value representing output voltage (column 6) 



is converted to a decibel scale referenced to the smallest 
peak value f V • Since recording levels were not standard- 
ized, only chwges within each performance can be compared: 

Sound Power (db) b 20 * 

5. The first predictor; Duration Ratio. 

The duration columns, #1 and #2 are summed to give 
the total time for each category. The ratio of these two 
values, called the "Duration Patio" is the first predictor. 

6. Conversion of Data Matrix to Deviation Scores. 

The values in each cell of the data matrix for each 

performance were subtracted frm the corresponding values of 
the performance standard. Two performance standards, the 
"Operational" and the "Literal" standard, were employed. 

In matrix notation, this can be expressed: 

D - S - A, 

Where "D" is the difference matrix, "S" is the matrix of 
standard scores, and "A" is the original matrix. The absolute 
values of the difference or deviation scores replace the 
original values in the matrix. By summing each column 
and dividing by the number of tones, a "mean deviation" 
score is found for each column. Thise values are the next 
six predictors. 

If each matrix entry is squared and the columns again 
summed, and divided by the number of tones, a "mean squared 
deviation" from the standard score is produced. These six 
values bring the total predictors to thirteen. 



Table A-1 
Sample Data I-latrix 



Tone 
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Tonal Non-tonal 


Initial 
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Final 
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APPENDIX II 



THE EFFECT OF THE ATTACK TRANSIENT ON AURAL 
RECOGNITION OF INSTRUMENTAL TIMBRES 

Ralph C* Thayer f Jr* 



Introduction 

One characteristic of musical performance which must 
be dealt with by performer and teacher without a clearly 
defined set of standards, is tone quality* Tone quality can 
be discussed by teachers and students only in vague terms, 
such as "dark”, ‘'bright”, "round”, "thin”, etc* These 
descriptions mean different things to differsnt people, and 
very little, if anything, to the yoiing student* The 
Instructor can only suggest physical changes, such as position 
of the bow or adjustment of the embouchure, none of which 
are sure answers to a tone quality problem* The student 
does not establish a goal of good tone quality until many 
years of experience and exposure to what is assumed to 
be good tone quality have been accomplished* Even then the 
steps that must be taken to achieve this goal are not 
obvious* Indeed, the goal itself may be faulty, with no 
standard by which to judge it* 

However, before standards of tone quality can be estab- 
lished, it must be determined what aspects of a musical tone 
are important in the aural perception of timbre* A musical 
tone consists of three basic parts: the attack, steady- 

state, and decay* Most research in the area of tone quality 
h8is dealt with the steady-state portion of the tone, 
asstuning that this area is virtually unaffected by what 
precedes or follows it* ”As has been shown by Helmholtz 
and others, the timbre of a given tone is determined by 
its harmonics, l*e*, by the greater or lesser prominence 
of some of these harmonics over the others" (Apel, 1947)* 
Recent investigation has indicated that this definition 
may no longer be an accurate description of timbre; that 
other factors, notably the attack of the given tone, may 
have some bearing on the perception of timbre* 

Before any meaningful qualitative evaluation of timbre 
can be accomplished, it must be determined what constitutes 
timbre — what effect, if any, or to what degree do factors 
other than harmonic structure influence one's judgement of 
tone quality* If these factors do have an influence, it must 
be decided if they can be studied separately or must be 
considered in relation to each other in any study of 
timbre* Particularly since the advent of magnetic recording 
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tape it has been recognized that the attack transient seems 
to play an important role in the recognition of instrumental 
timbre* In synthesizing trumpet tones, the attack is found 
to be one of the subjectively important parameters (Risset 
and Mathews, 1969)* 1^* an attack is important to instrumental 

timbre recognition, is the quality of the attack also important? 
Should the attack be considered as merely a necessary 
embellishment or as an integral part of the characteristic 
qualities of a tone? 



Prob lem 

The purpose of this study is to determine what effect 
the attack transient has on the aural recognition of 
instrumental timbres* As we have no objective standards 
for good attacks or timbre, this study approaches the problem 
by mechanically replacing the attack of one instrument by 
the attack of another, to determine if, in this way, the 
listener is influenced in his attempt to recognize the timbre 
more by the attack or by the remainder of the tone* If 
the attack plays an important role, the listener will either 
be confused in identifying the instrumental timbre or will 
identify the timbre as that of the attacking instrument* 

If the nature of the attack significantly influences the 
listener at this level of discrimination, it would present 
the possibility that the attack would affect evaluation at 
a much finer level, such as determining what good tone 
quality is* 

For the purposes of this study "attack” refers to the 
characteristic beginning of an instrument ally produced 
tone, and includes all of the tone up to the steady-state 
portion, or relatively periodic section* "Decay", or 
"release", refers to the ending of the tone, or that portion 
from the point that can no longer be defined as steady-state 
to the cessation of sound* For want of a better word, "steady- 
state" is meant to include all of the tone beyond the attack, 
including the decay* As the decay is always included in 
the tones presented in this study, this should not cause 
any confusion as to, the meaning of the term* "Timbre" 
and "tone quality** are used synonomously to mean those 
qualities which differentiate the tone of one instrument 
from another* 

Limitations 

This study must be limited in several respects* Only 
three pitches are represented, A larger number would become 
ertremely awkward to handle statistically, and would present 
a much longer test period to the subjects, causing fatigue* 
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The three pitches selected are fairly representative 
of the ranges of the instruments involved. For similar 
reascns the number of instruments is limited to four. The 
instruments can all be classified as soprano wind instruments. 
Therefore I any discussion of results must be assiuned to 
apply only to this family of instruments. Similar studies 
using instruments of different families and different 
ranges might produce quite different results. Any generaliza- 
tions to time and nature of attack must also be limited. The 
number of possible lengths of tones is obviously infinite, 
and no study could attempt to include them all specifically. 
The length of tones presented in this study is from one and 
a half to two seconds. Results obtained from tones of 
much shorter or much longer duration could be entirely 
different, and generalizations must be limited to tones of 
this approximate duration. The attacks utilized in this 
study may be classified as normal, that is, not accented 
and not legato . 

Hypotheses 

For statistical purposes, the null hypotheses are 
stated as follows: 

1. There is no significant difference between instmm- 
ental timbre recognition scores of subjects presented 

with unaltered instrumental tones and instrumental timbre 
recognition scores of subjects presented with instrumental 
tones altered by substitution of the attack transients of 
other instruments. 

2. There is no significant difference between in- 
strumental timbre recognition scores of subjects presented 
with unaltered instrumental tones and instrumental timbre 
recognition scores of subjects presented with instrumental 
tones altered by elimination of the attack transient. 

3. There is no significant difference between in- 
strumental timbre recognition scores of subjects presented 
with instrumental tones altered by elimination of the attack 
transient and instrumental timbre recognition scores of 
subjects presented with instrumental tones altered by 
substitution of the attack transients of other instruments. 



Procedures 

A test, designed to measure the effect of attack on 
timbre recognition, was administered to three groups: 
group A— 57 high school instrumentalists, group B — 43 college 
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non-music majors i and group C — 38 college level and above 
music majors. 

The members of group A were asked to indicate their 
instrument on the answer sheet. This would allow for 
possible future analysis based upon instrumental background. 

The members of group B were not music majors f but were members 
of a music history class i possibly indicating some musical 
background. The members of group C were music majors | including 
several music faculty and professionally performing musicians. 



Test 



Four instruments were selected from the soprano wind 
family — the flute | the oboe, the clarinet, and the trumpet. 
From the common range of these instruments three pitches 
were selected— d' , c", g^'*; that is, D below the bwttom 
line of the treble staff, C on the third space of the treble 
staff, and above the top line of the treble staff. 

These pitches were agreed upon by the performers as fairly 
representative of the common range of the four instruments. 
Each instrument was played by a professionally employed 
specialist on that instrument. 

Each of the three pitches was recorded on tape by each 
of the four instruments. A Conn Strobotuner and a Hewlett- 
Packard 4 OOE AC volt meter were placed so as te be easily 
visible t.; the performers. A standard of pitch and loudness 
was determined for each note. Each performer was able to 
check his performance against these standards by use of the 
tuner and volt-meter. A length of approximately one and 
one-half seconds was selected for each note. Each performer 
practised each pitch with the measuring instruments several 
times until the standards were matched, including pitch, 
loudness, and time. Several recordings were made of each 
note in &rder that the most accurate could later be selected. 

The recording was supervised by a qualified recording 
engineer. A Fetimann model U67 condenser microphone, an 
Ampex FR-10 mixer, and an Ampex AG-440 tape recorder were 
the recording instruments. The tones were recorded full 
track, monophonic, at inches per second, on 3M Scotch 
202 recording tape (I.5 mils). A minus 2 db on the VU 
meter was used as constant peak level. 

The resultant recording was analyzed and twelve tones 
were selected as most accurately meeting the standards, 
one tone each cf the three pitches performed by each uf 
the four instruments. These twelve tones became the raw 
material from which the test tape was prepared. 
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Each tone was then prepared for the test tape in the 
following manner; unaltered, that is, exactly as recorded; 
with the normal attack replaced by the attack of each of 
the other three instruments en the same pitch; and with 
the attack portion deleted entirely. Each tone treated in 
this manner resulted in a total of sixty items. On the final 
test tape each item was repeated, making a total of 120 items. 

The portion of the initial tone which could be consider- 
ed "attack" had to be determined. The tones were played at 
slow speeds and the approximate length of the attack was 
determined aurally. All subjective decisions were made by 
a panel of professional musicians* Several experimental 
splices were made to ascertain the exact length of attack 
which would result in the most normal transition from attack 
to steady-state. Also, the effects of straight and diagonal 
splices were determined* These evaluations resulted in 
the use of diagonal splices and an attack length of one- 
twentieth of a second (three-quarters of an inch to the 
middle of the diagonal splice, at 15 inches per second). 

Using these criteria, the attacks of the tones t«» be altered 
were replaced with the attacks of the other instruments on 
the same pitch. The attacks of the tones to be presented 
with no attack were deleted with a square cut. 

The 120 items were placed on the final tape in randomly 
selected order* They were placed in groups of five items 
each* A pause of four seconds was placed between each of the 
items in the first two groups, three seconds between the 
remaining items* Between each group of five, an eig^t 
second pause was placed* The items were grouped in this 
manner in order to correspond with the answer sheet, which 
was prepared with the items placed in groups cf five* 

It was felt this would facilitate keeping one's place on 
the answer sheet without using audible numbers, which 
might tend to distract from the purposes of the test* 

Also, the longer spaces between the first ten items would 
give the subjects time to become familiar with the mechanics 
of the test before requiring more rapid responses* 

"Print-through," or the phenomenon of the recorded 
sound appearing on the previous layer of tape, presented 
problems on the completed tape. In order to eliminate the 
print-through, each item was separated by an appropriate 
length of paper tape, which has no recording properties* 

The subjects were presented, then, with a tape of 
twelve minutes, thirty-seven sect>nds duration, consisting 
of the 120 prepared items* They were asked to indicate 
which instrument each item suggested to them. The answer 
sheet was prepared with the four possible choices — flute. 



oboe, clarinet, and trumpet. Each subject was asked tc 
place a check in the appropriate box for each item. They 
were not told what alterations had been made to the tones, 
only that the tones were synthesized. 



Scoring 

In order to establish a basis for scoring, each 
response which did not identify the steady-state portion of 
the item was considered an error. Using this criterion, 
each paper was corrected and three scores were obtained: 
number of normal tones correct, number of altered tones — 
or tones with attacks of other instruments—correct , and 
ntunber of no-attack tones correct. As the test consisted 
of 24 normal tones, 24 no-attack tones, and 72 altered 
tones, the scores of correct normal tones and no-attack 
tones were multiplied by 3 in order to arrive at a possible 
score of 72 correct items for each category. These scores 
were used to calculate the means and to test the null 
hypotheses. 

Each error was then charted to show what instrument 
the subject had indicated on the answer sheet. This chart 
provided detailed information as to the actual pitch, 
attack, and steady-state of the item incorrectly identi- 
fied, as well as the response of the subject. This infor- 
mation wr/i then compiled on a master chax^ for eaoh group 
from whio.i analyses were made. 

In order to faoilitate compiling the data a system of 
symbols has been devised. The normal tones are represented 
by two letters; the initial letter of the instrument 
(P-flute, 0-oboe, C-clarinet, T-trumpet) plus the letter 
name of the pitch; hence TD indicates trumpet sounding d' . 
The initial letter of the attacking instru' nt is placed 
before this symbol to indicate the altered tones; hence OTD 
represents oboe attack and remainder of the tone trumpet 
sounding d'. The symbol for normal tones followed by (NA) 
indicates the tones with no attack; hence CG(NA) represents 
clarinet sounding g^* ' with no attack. The flat symbol (^) 
is not used. 



Results 

The means and standard deviations (S.U.) for each group 
and each type of stimulus are presented in Table 1. Table 
2 represents the conversion of these data to peroentages. 

The statistical comparisons for each groui> which tested 
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TABLE 1 



MEANS AND STANDARD DEVIATIONS * 



Group 


rie 


Normal 


No 


Att aolc 


Altered 




Mean S*D» , 


Mean 


S.D. 


Mean 


S.D. 


A 


57 


53.32 9.59 


43.74 


9.50 


38.32 


6.93 


B 


43 


52.81 12.04 


45.35 


11.46 


38.93 


9.62 


C 


38 


59.63 10.26 


50.61 


9.37 


45.13 


8.86 



* Highest possible score = ^2 



TABLE 2 

MEANS COirVERTED TO PERCEfTTAGES 



Group 


Normal (‘/t) 


No Attaok (^/S) 


Altered {'f.) 


A 


74.01 


60.75 


53.22 


B 


73.35 


62.99 


54.07 


C 


82.82 


70.29 


61.85 



Group A a High Sohool Instrumentalists 
Group B = College Non-Music Majors 
Group C = College (and above) Music Majors 
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TABLE 3 



RESULTS OF STATISTICAL TESTS OP SIGMIPICAKCE 



Group 


Comparisons 


Means 


11 


Probrbility 

of 

Null Hypothesis 




(a) Formal vs 
No Attack 


53.32-43.74 


5.293 


.001 

(Null Hyp. 2) 


A 


(b) No Attack 
vs Altered 


43.74-38.32 


3.452 


.001 

(Null Hyp. 3 ) 




(c) Normal vs 
Altered 


53.32-38.32 


• • 


Rejected Null 
Hyp. 1 on basis 
of (a) and (b). 




(a) Normal vs 
No Attack 


52.81-45.35 


2.903 


.01 

(Null Hyp. 2) 


B 


(b) No Attack 
vs Altered 


45.35-38.93 


2.779 


.01 

(Null Hyp. 3 ) 




(c) Normal vs 
Altered 


52.81-38.93 


• • 


Rejected Null 
Hyp . 1 on basis 
of (a) and (b). 




Normal vs 
No Attack 


59.63-50.61 


6.502 


.001 

(Null Hyp. 2) 


C 


(l) No A+tack 
vs Altered 


50.61-45.13 


2.585 


.01 

(Null Hyp. 3 ) 




(c) Normal vs 
Altered 


59.63-45.13 


• • 


Rejected Null 
Hyp. 1 on basis 
of (a) and (b). 



the three null hypotheses stated for the study, are presented 
in Table 3« The comparisons between normal versus altered 
tones (o) were not oaloulated sinoe each of these mean 
differences were larger than the first two comparisons, 
and therefore were at least as significant as the (a) and 
(b) comparisons. 

Pvtrther tests of significance determined no signif- 
icant difference in any of the categories (Normal, No- 
Attack, Altered) between Groups A (high school) and B 
(college non-music majors). There were, however, signif- 
icant differences found between Group C (college level and 
above music majors) and both Groups A and B in all cate- 
gories (See Table 4)* 

Tables and graphs reporting detailed anal.yses of the 
test results appear at the end of this report. The results 
for Group C (Tables 5 and 6 ) have been extracted from these 
tables for discussion here, since this group (music majors) 
had the highest correct recognition for normal tones and 
therefore gives the clearest picture of the effects of 
recognition of instrument when the attack is altered or 
eliminated. 

In Table 6 , the capital letters in the left column 
refer to the attacking and steady-state instruments (CP = 
clarinet attack, flute steady-state; (NA) = no attack). 

The lower case letters refer to the picches d’ , c” , and 
gb»«. iphe numbers indicate the number of errors made by 
the group in each category, and what instrument was incorrectly 
identified. 

The oboe tone was least often correctly identified 
among the normal tones. Table 6 sh 3 ws that most of these 
errors, 55 ( 2 + 8 + 45 ) » resulted from confusion with the 
clarinet, most of these occuring in the upper register, 

45 — gb, ^ much smaller number, 12 ( 1+3+8) » were incorrectly 
identified as trumpet; again, the number of errors increasing 
as the range increases. Although the oboe was most often 
confused with the clarinet, and next the trumpet, the 
reverse did not hold true. The mxmber of incorrect clarinet 
identifications was quite small, 18 ( 4 + 6 + 8 ) incorrectly 
identified as oboe, 8 (5+3) as flute, end 2 (I+I) as 
trumpet. The trumpet tone was most often confused with the 
clarinet, 25 (5+20). Practically all of these occured 
in the upper register, 20 — A small number of errors 
in the lower register, 8, resulted from confusion with the 
oboe. Relatively few errors occurred in the identification 
of the flute tone. Those that did occur were confined to 
the low and middle register and were fairly evenly divided 
among the other three instruments. 
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TABLE 4 

STATISTICAL TESTS BETVJEEN GROUPS 



GROUPS A A^'D B 



Category ;t 



Normal 0.226 

No Attack 0.759 

Altered O.404 



No significant difference 
in any category 



GROUPS B AND C 

Category ;t 

Normal 2.696 

No Attack 2.219 

Altered 2.967 

Significance beyond the 
.05 level in all 
categories 



GROUPS A AND C 



Category ^ 



Normal 3*014 

No Attack 3.435 

Altered 4.078 



Significance beyond the 
.01 level in all categories 



TABLE 5 

CORRECT RESPONSES (iN PERCENTS) FOR GROUP C 

Flute Oboe Clarinet Tmimpet Av. 

(steady-state) (Steady~state) (steady-state) (Steady-state) 



92.5 

89.0 



j 75.6 

i 

I 



6.3 



i . 61.8 

( 



I - 89 - 

i ■ 96 

o 

ERIC 



of Normal Tones Correctly Identified 

70.2 87.7 82.9 82.82 

fu of No-Attack Tones Correctly Identified 

66.6 82.0 ' 46.1 70.29 

^<1 of Altered Tone Steady-States which 
Were Correctly Identified 

50.1 78.5 45.9 61.85 

% of Altered Tone Steady-States which VJere 
Identified as the Attacking Instrument 

26.0 12.0 22.4 17.90 

>3 of Altered Tone Attacks which Were Identified 

as the Attacking Instrument 

15.8 27.8 21.8 17.90 

’fo of Altered Tone Attacks which Were Identified 

as the Steady-State Instrument 

67.8 59.3 61.2 61.85 



TABLE 6 



I 

1 



f 

I 



{ 



RESPONSE ERRORS FOR GROUP C 



FLUTE 


Oboe 


Clar 


Trpt 


OBOE 


Flute 


Clar 


Trpt 


d 


3 


4 


3 


d 


• • 


2 


1 


FF c 


5 


2 


• • 


00 a 


• • 


8 


3 




• • 


• • 


• • 


g'’ 


1 


45 


8 


d 


8 


1 


1 


d 


• • 


4 


2 


F c 


3 


3 


1 


0(KA) c 


1 


5 


4 


(NA)g^ 


1 


7 


• • 


g^ 


5 


45 


10 


d 


12 


10 


1 


d 


• • 


3 


1 


OP c 


6 


12 


1 


PO c 


• • 


11 


20 


g^ 


• • 


7 


• • 


g^ 


5 


54 


3 


d 


8 


13 


1 


d 


• • 


13 


1 


CP c 


5 


10 


• • 


o' 

o 

o 


1 


10 


11 


g^ 


6 


16 




g 


2 


59 


8 


d 


7 


4 


6 


d 


• • 


3 


13 


TP c 


4 


13 


11 


TO c 


• • 


7 


49 


g^ 


2 


9 


3 


g^ 


3 


36 


28 



CLARINET 


Flute 


Oboe 


Trpt 


TRUMPET 


Flute 


Oboe 


Clar 


d 


• • 


4 


1 


d 


• • 


8 


• • 


CC c 


• • 


6 


• • 


TT c 


• • 


2 


5 


g^ 


3 


8 


1 


g^ 


2 


2 


20 


d 


1 


5 


• • 


d 


• • 


27 


2 


C c 


10 


5 


• • 


T(KA) c 


• t 


16 


15 


(NA) 


6 


12 


2 


g^ 


5 


16 


42 


d 


• • 


5 


1 


d 


1 


31 


3 


PC c 


4 


12 


1 


FT c 


1 


11 


15 


g^ 


10 


4 


• • 


g^ 


21 


9 


35 


d 


• • 


6 


2 . 


d 


• • 


36 


2 


OC c 


3 


8 


• • 


o 

o 


• • 


13 


15 


g^ 


6 


15 


3 


g° 


5 


12 


45 
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TABLE 6 (continued) 



CLARXKET Flute Oboe Trpt TRUMPET Flute Oboe Clar 



d 1 8 1 d . . 26 5 

TC c 5 7 6 CT c .. 10 17 

2 5 32 g'’ 5 16 42 
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The tendencies that are displayed in the identifi- 
cation of the normal tones are the bases against which the 
results of the identifications of the no attack tones are 
compared: a strong tendency for oboe to be confused with 

clarinet, particularly in the upper register; a tendency 
for trumpet to be confused with clarinet i particularly in 
the upper register | less often with oboe in the low register; 
a slight tendency for clarinet to be confused with oboe 
over the entire ranget increasing some as the range increases; 
and a tendency for flute to be fairly accurately identified. 

The no-attack portion of the test resulted in a 
comparatively larger number of errors in trumpet recog- 
nition. While the scores of the other three instruments 
dropped from 3.7 to 5.7 percentage points, the trumpet 
identification scores dropped 36.8 percentage points, 
making the trumpet the least identifiable of the four 
instruments without attack. This occurred as a result of 
an amplification of the tendencies noted for trumpet in 
the normal tones; confusion with clarinet in the upper 
register (42), and, somewhat lees, with oboe in the lower 
register (27). The scores of the other three instruments 
were primarily a result of a continuation of the trends 
already noted for normal tones. 

The replacement of the normal attack with the attacks 
of the other instnunents resulted in substantially lower 
scores in two categories, flute and oboe. Clarinet iden- 
tifications were somewhat less accurate than the no-attack 
scores (3*5 percentage points), and trumpet very little 
less (0.2 percentage points). 

Piute steady-states were incorrectly identified more 
often as clarinet, regardless of the attacking instrument — 

29 with oboe attack, 39 with clarinet attack, and 26 with 
trumpet attack. With oboe attack, 18 were identified as 
oboe, 2 as trumpet; with clarinet attack, 19 as oboe, 1 as 
trumpet; with trumpet attack, 13 as oboe, 20 as trumpet. 

The effect of the clarinet and trumpet attacks on the 
incorrectly identified oboe steady-states was to cause the 
steady-states to be identified predominantly as the attack- 
ing instrument. Eighty-two oboe steady-states with clar- 
inet attacks were identified as clarinet, mainly in the 
upper register (59) » and 90 oboe steady-states with trumpet 
attack were identified as trumpet, mainly in the middle 
register (49)» The flute attack resulted in 68 oboe 
steady-states being identified as clarinet, mainly in the 
upper register (54), 24 as trumpet, mainly in the middle 
register (20), and only 5 as flute, in the higher register. 
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The clarinet steady-state was most often incorrectly 
identified as tnampet (39) when preceded by the trumpet 
attack, predominantly in the upper register (32), and as 
oboe when preceded by the oboe ( 29 ) or flute attacks (21 ). 

The trumpet steady-state was influenced by all three 
attacks in relatively equal numbers: 127 errors with flute 

attack, 128 with oboe, and II 5 with clarinet. The flute 
attack resulted in 23 being identified as flute, 21 in the 
upper register; 51 as oboe, mainly in the lower register 
( 31 ); and 53 as clarinet, mainly in the upper register 
(35) • The oboe attack resulted in only 5 being identified 
as flute (upper register), 6l as oboe, and 62 as clarinet. 

As to register, the tendency remained the same — lower for 
oboe ( 36 ) and upper for clarinet (45) • With clarinet 
attack, the trumpet steady-state was most often identified 
as clarinet ( 69 ), next au oboe (43) i and least as flute 
(3)* The range tendencies remained the same. 

The percentage of attack transients which were iden- 
tified correctly— that is, the subject chose the attacking 
instrument as the predominant timbre — was much smaller than 
the percentage of steady-states correctly identified. The 
clarinet attack was most often correctly identified (27*8^). 
The largest number occurred when it was attached to the oboe 
steady-state (82). When preceding the trumpet steady-state, 
69 were correctly identified, and 39 vrhen preceding the 
flute steady-stato. 

Of the trumpet attacks, 21.8 percent were correctly 
identified — 90 when preceding oboe steady-state, 39 before 
clarinet, and 20 before flute. Oboe attacks were cor- 
rectly identified 15.8 percent of the time: 6l with 

trumpet, 29 with clarinet, and I8 with flute steady-states. 
The least often correctly identified attack, flute, oc- 
curred only 6.3 percent of the time: 23 with trumpet, 

14 with clarinet, and 5 with oboe steady-states. 



Summary of Results 

In general, the flute and clarinet steady-states 
were the most recognizable of timbres, under these condi- 
tions. The identification of the flute was slightly less 
accurate than that of the clarinet when preceded by other 
attacks, but was more accurate when preceded by its own or 
no attack. The attack portions of these two instruments are, 
however, in msu?ked contrast to each other. The flute 
attack was least often correctly identified and the clari- 
net attack most often correctly identified of all of the 
instruments included in this study. This would indicate 





that the flute provides very strong identification infor- 
mation in its steady-state I but very little in its attack. 

The clarinet provides strong identification information in 
its steady-state and in its attack. However, the combi- 
nation of clarinet steady-state and attack does not result 
in any substantial reinforcement of this information. 

The oboe was the least recognizable of the normal 
tones, followed, but at a relatively large interval, by the 
trumpet. With the removal of the attack portions, the 
trumpet dropped well below the obo# in degree of recogniz- 
ability. With the addition of attacks of the other instru- 
ments, oboe identification scores dropped considerably, the 
trumpet scores only slightly, but the trumpet remained the 
least accurately identified timbre. The oboe provides, 
overall, the least identification information of any of 
the instruments included in this study. It resulted in 
its steady-state being most greatly influenced by the' 
attacks of other instruments, and its attack having the 
least influence on the steady-states of other instruments. 

The trumpet attack provided a great deal of identi- 
fication information in combination with the trumpet 
steady-state. Without it, the steady-state was influenced 
by other attacks almost as much as the oboe steady-state. 

But, in combination with other steady- states the trumpet 
attack did not provide as much identification information 
as the clarinet attack. The trumpet attack, then, does 
reinforce the information provided by the trumpet steady- 
state in identification of this timbre. The steady-state 
by itself or in combination with other attacks is easily 
confused or identified as the attacking instrument, and the 
trumpet attack with other steady-states does not provide as 
much information as one might anticipate, in light of its 
effect on the trumpet steady-state. 

It must be kept in mind that this discussion of 
results is made in relation to the overall effect reported 
in the results. This overall effect is that the identification 
of timbre becomes less accurate (statistically significant, 
P<.01) as the tone progresses from normal, to no-attack, 
to altered. 



Conclusions 

This study has attempted to determine what effect the 
attack transient has on the aural recognition of instru- 
mental timbres. In general, the results show significantly 
less identification accuracy when the normal attack tran- 
sient is removed. This difference is not equally displayed 
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by each of the instruments, however. The accurate identi- 
fication of flute timbre was affected very little by the 
removal of the attack, in one case, Group B, actually in- 
creasing in accuracy. The identification of trumpet 
timbre was much les£i accurate when the attack was removed, 
with the degree of accuracy of identification of oboe and 
clarinet lying between these two extremes. 

With the addition of foreign attacks the identifi- 
cation of the flute and oboe became considerably less 
accurate than in the no-attack state. The identification 
of clarinet and trumpet was only slightly less accurate. 

Further analysis of the results show that the flute 
and clarinet steady-states were least apt to be identified 
as the attacking instrument} trumpet and oboe, the most 
apt to be identified as the attacking instrument. The 
clarinet attack, when combined with other ateady-states, 
was most apt to be identified as the predominant timbre} 
the flute, least , lifhen combined with the clarinet attack, 
other steady-states were least apt to be identified as the 
predominant timbre} with the oboe attack, most . 

It is apparent that the identification of each of 
these instruments is affected differently by the altera- 
tions made in this study, and, from these differences cer- 
tain characteristics can be attributed to each instrument. 

The flute attack is relatively vinimportant as an 
indicator of timbre. The flute timbre is quite readily 
identifiable either with or without the attack amd the 
flute attack does not greatly influence the recognition of 
other steady-states. The flute steady state retains its 
identity quite well in combination with other attacks. The 
flute might be characterized as having a weak attack and 
a strong steady-state. 

The oboe is the least accurately identified, even in 
its normal state, and the absence of attack affects this 
but little. The steady-state is noticeably influenced by 
other attacks, and the oboe attack influences other steady- 
states very little. The oboe might be characterized as 
consisting of both a weak attack and a weak steady-state, 
either in combination or separately. 

The clarinet is only slightly less accurately identi- 
fiable without its attack than it is with it. The steady- 
state retains its identity the best of any of the instru- 
ments when combined with other attacks, and the clarinet 
attack substantially influences other steady-states. The 
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clarinet might be characterized as having, separately, a 
strong attack and a strong steady-state, but these 
strengths not necessarily reinforcing each other in the 
normal state. 

The trumpet becomes the least accurately identifi- 
able without its attack. The substitution of foreign 
attacks does not result in reducing this accuracy substan- 
tially, but the tinimpet steady-state is relatively easily 
influenced by the attacking instrument. Although the trum- 
pet attack plays an extremely important role in connection 
with the trumpet steady-state, this influence is not car- 
ried over to other steady-states as strongly as the influ- 
ence of the clarinet attack. The trumpet might be charac- 
terized as having, separately, a weak steady-state and a not 
too strong attack. But, in combination with each other in 
the normal state, they reinforce one another substantially. 

It is obvious that no one specific definition of 
timbre can be applied to all of these instruments. On the 
basis of this study, timbre may be defined only as those 
qualities that differentiate the tone of one instmment 
from another. Further studies in this area of all instru- 
ments, and more comprehensive studies of individual instru- 
ments, are indicated in order to arrive at a more accurate 
appraisal of the distinguishing characteristics of each 
instrument. When these characteristics have been deter- 
mined and standardized, the basis for studies dealing with 
the standardization of tone quality (in this sense, the 
determination of good tone quality for each instrument) 
will be established. 
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TABLE 7: TABLES OF ERRORS 



(The capital letters in the left column refer to the attack- 
ing and steady-state instruments, CP = clarinet attack, 
flute steady-state; FA ® no attack* The lower case letters 
refer to the pitches d' . c''. amd g^''. The numbers indic- 
ate the number of errors made by the group in each category, 
and what instrument was incorrectly identified.) 

GROUP A 



FLUTE 


Oboe 


Clar 


Trpt 


OBOE 


Piute 


Clar 


Trpt 


d 


30 


22 


1 


d 


1 


3 


3 


PP 0 


7 


19 


• • 


00 0 


1 


10 


2 


gt 


3 


7 


1 


g^ 


5 


54 


5 


d 


31 


19 


3 


d 


2 


4 


7 


P 0 


9 


16 


1 


0 0 


3 


24 


3 


(NA)gt 


2 


11 


2 


(NA) g^ 


9 


59 


12 


d 


68 


22 


• • 


d 


• • 


2 


7 


OP 0 


11 


26 


6 


O 

p 


3 


32 


5 




5 


8 


2 


g° 


12 


72 


3 


d 


53 


28 


3 


d 


• • 


14 


3 


CP 0 


11 


30 


3 


CO 0 


• • 


34 


7 


g^ 


11 


22 


6 


g^ 


9 


66 


21 


d 


22 


10 


46 


d 


1 


8 


10 


TP 0 


9 


21 


39 


TO 0 


1 


19 


33 


g^ 


2 


11 


2 


g^ 


9 


59 


12 



CL/filMT 


Flute 


Oboe 


Trpt 


TRUMPET 


Piute 


Oboe 


Clai 


d 


1 


12 




d 


2 


17 


6 


CC 0 


11 


6 


• • 


TT 0 


2 


8 


4 


g^ 


16 


29 


4 


g^ 


20 


8 


33 


d 


2 


14 


• • 


d 


• • 


47 


3 


C 0 


30 


12 


1 


T 0 


3 


19 


29 


(RA)g^ 


19 


28 


5 


(NA) g^ 


29 


26 


42 
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GROUP A — Continued 



CLARINET 


Piute 


Oboe 


Trpt 


TRUMPET 


Flute > 


Oboe 


Clar 


d 


2 


15 


1 




d 


2 


55 


3 


PC c 


24 


4 


• • 


PT 


0 


2 


21 


37 




29 


7 


5 




g^ 


46 


8 


52 


d 


• • 


15 


• • 




d 


1 


61 


10 


OC c 


7 


12 


• • 


OT 


c. 


2 


8 


42 


g^ 


22 


24 


6 




g^ 


18 


29 


52 


d 


4 


11 


1 




d 


• • 


51 


28 


TC c 


15 


9 


12 


CT 


0 


2 


20 


34 


g^ 


14 


10 


43 




g^ 


23 


13 


49 


No 


response 


pd(na)-i 


OG- 


1 


CC-1 


TG-1 










PC(NA)-1 


TOC 


-2 


CG-1 


OTD-1 










TPD -1 






PCD-1 


OTG-1 







CTC-2 



GROUP B 



FLUTE 


Oboe 


Clar 


Trpt 


OBOE 


Flute 


Clar 


Trpt 


d 


11 


20 


1 


d 


2 


4 


4 


PP c 


3 


7 


3 


00 0 


1 


17 


4 


g^ 


• • 


5 


1 


g^ 


8 


40 


10 


d 


15 


12 


• • 


d 


• • 


10 


4 


P c 


6 


7 


1 


0(NA)c 


3 


15 


5 


(NA)g^ 


1 


8 


• • 


g^ 


14 


51 


6 


d 


27 


27 


3 


d 


1 


4 


1 


OP c 


10 


16 


3 


0 


4 


22 


16 


g^ 


1 


9 


1 


g^ 


16 


54 


2 


d 


19 


27 


4 


d 


2 


11 


4 


o 

o 


12 


13 


1 


CO 0 


3 


22 


13 


g^ 


7 


13 


5 


_b 

g 


18 


48 


13 


d 


12 


10 


33 


d 


• • 


9 


17 


TP c 


3 


8 


35 


TO 0 


1 


15 


46 


g^ 


4 


9 


10 


g^ 


12 


44 


20 
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GROUP B— Continued 



CLARIIIET 


Flute 


Oboe 


Trpt 


TRUMPET 


Flute 


Oboe 


Clar 


d 


• • 


15 


• • 




d 


• • 


7 


1 


CC 0 


9 


10 


1 


TT 


0 


• • 


3 


2 




10 


16 


8 




g*^ 


20 


6 


21 


C(NA) d 


• • 


22 


2 




d 


• • 


27 


6 


C 


19 


8 


• • 


t(ka) 




2 


14 


14 


g^ 


12 


12 


3 




g^ 


18 


17 


40 


d 


1 


16 


2 




d 


• • 


34 


6 


FC c 


12 


13 


• • 


FT 


0 


3 


9 


15 


g^ 


19 


6 


1 




g^ 


38 


6 


40 


d 


1 


17 


3 




d 


• • 


36 


6 


OC 0 


8 


16 


1 


OT 


0 


• • 


10 


20 


g'" 


15 


11 


• • 




g^ 


15 


15 


44 


d 


1 


16 


5 




d 


• • 


27 


14 


TC 0 


9 


5 


13 


CT 


0 


3 


19 


15 


g^ 


12 


4 


38 




g^ 


19 


5 


34 



Ko response 0 C(Na)-1 
COD -2 
TOD -1 



CG-1 TD(NA)-1 
OCG-1 OTC -1 
CTD -1 



Group C responses appear in the text of this report. 
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TABLE 8: TABLES OP PEr:CENTAGES 



PERCENT OF NORMAL TONES 
CORRECTLY IDENTIFIED 



Steady-State 



Group 





Flute 


Oboe 


Clar 


Trpt 


Average 


A 


73.7 


75.4 


76.9 


70.8 


74.01 


B 


80.2 


65.1 


73.3 


76.7 


73.35 


C 


92.5 


70.2 


87.7 


82.9 


82.82 


Average 


82.1 


70.2 


79.3 


76.8 


76.73 



PERCENT OP NO-ATTACK TONES 
CORRECTLY IDENTIFIED 



A 


72.5 


64.0 


67.5 


42.1 


60.75 


B 


80.6 


58.1 


69.8 


46.5 


62.99 


C 


89.0 


66.6 


82.0 


46.1 


70.29 


Average 


80.7 


62.9 


73.1 


44.9 


64.68 



PERCENT OP ALTERED TONE STEADY-STATES 
CORRECTLY IDENTIFIED 



A 


52.6 


55.^ 


71.5 


34.8 


53.22 


B 


58.4 


46.0 


68.3 


44.1 


54.07 


C 


75.6 


50.1 


78.5 


45.9 


61.85 


Average 


62.2 


50.6 


72.8 


41.6 


56,38 



PERCENT OP ALTERED TONE STEADY-STATES 
IDENTIFIED AS THE ATTACKING INSTRUMENT 



Group 






Steady-State 


















Flute 


Oboe 


Clar 


Trpt 


Average 


N 


A 


25.1 


18.7 


15.8 


25.2 


21.22 




B 


21.8 


23.9 


17.1 


21.3 


21.03 




C 


11.3 


26.0 


12.0 


22.4 


17.90 




Average 


19.4 


22.9 


15.0 


23.0 


20.05 






PERCENT 


OP ALTERED TONE ATTACKS 








IDEl^ITIPIED AS 


THE ATTACKING 












INSTRUMEI'IT 












Attack 






Group 
















Flute 


Oboe 


Clar 


Trpt 


Average 




A 


11.7 


22.7 


29.7 


20.8 


21.22 




B 


12.1 


18.5 


25.5 


28.0 


21.03 




C 


6.3 


15.8 


27.8 


21.8 


17.90 




Average 


10.0 


19.0 


27.7 


23.5 


20.05 





PERCENT OF ALTERED TONE ATTACKS 
IDENTIFIED AS THE STEADY-STATE 
INSTRUIffiNT 



A 


56.2 


55.5 


47.3 


55.8 


53.22 


B 


55.9 


59.3 


52.1 


49.5 


54.07 


C 


61.8 


67.8 


59.3 


61.2 


61.85 


Average 


58.0 


60.9 


52.9 


55.5 


56.38 
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Appendix III; COMPUTER ANALYSIS SYSTEM SOFTWARE 

Jack Owens 

This section describes a library of computer programs 
used to perform certain basic mathematical analyses of the 
digitized musical performances. A package of programs 
performing optional supporting functions is also described. 

The computational subroutines extract from the raw 
data quantitative information usually used to characterize 
acoustical' signals. There are three subroutines ) one to 
obtain the signal intensity, another the fundamental frequency, 
and the third the overtone spectrum. Associated with each 
is an optional l/o subroutine which may be used to store and 
retrieve randomly on a disk pack the output of the sub- 
routines, This reduces the necessity to re-compute results 
when they must be used repetitively. There is likewise a 
program which translates the digitized information on the 
input tape into a computer format and then stores the results 
on a disk pack for subsequent random access. Finally a main- 
line program is provided which controls these subroutines 
and displays their output on a 22^0 cathode ray display 
terminal as directed by the terminal operator. This enables 
the operator to specify features of interest in the data 
efficiently, so that they may be retained for subsequent 
processing. Visual access to the data in this way was 
deemed necessary because of the large volume involved. 

It would be unnecessarily expensive to process completely all 
of the data, or even to graph all of it on a computer print- 
out. 



Figure 1 shows the relationship among these programs 
when they are used in conjunction with the 2250. It 
emphasizes the independence of the programs from each other. 
Each may be incorporated individually in another application. 
Thus the user may select to use only those programs perform- 
ing functions needed for his pai*ticular problem. 



Programming Considerations 

The design philosophy ef these programs is based upon 
several factors. In the first place, it is expected that 
they will be used by people not necessarily familiar with the 
details of the programming. This requires them to be easy 
to operate. Input arguments must be specified in a feishion 
relating in an obvious way to the application. Reliability 
is important. Faulty specification of arguments should! leeid 
to a default asstimption or an error indication, rather than 
to ambiguous execution-time errors. 
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Secondly^ tho programs will probably receive heavy usage. 

This is because the functions they perform are basic to many 
types of analyses of musical performances. Thus they must 
be efficient and not use excessive central processor time. 

They must be able to operate independently! so that only 
programs performing those functions required for a particular 
application need be used. Provisions must be made so that 
their performance can be optimized easily in different 
applications. 

Finally! allowances must be made for the possibility of 
future modifications in the logic of the programs. This ap- 
plies especially to the computational subroutines and the dis- 
play control program. This means that programming logic 
should be kept free of complexities and similar logic should 
be used to solve similar problems. 



Implementation 

In carrying cut the programming choices must be made of 
the progi*amming language! how the analysis is to be segmented, 
and how the subroutine arguments are to be specified. It 
was decided to use PL/I for the computational subroutines 
(called "procedures” in PL/i) and the main-line display 
control program. It is expected that the former will be used 
in main-line programs written in PL/i so they could not be 
written in PORTPiAN. The use of assembly language would make 
future modifications of their logic extremely difficult! 
so PL/I was chosen. This requires the calling program! which 
in this case is the display control program, to be in PL/I. 

The l‘iss of efficiency resulting from this choice was deemed 
insignificant. To minimize this loss, none of the more 
sophisticated, and consequently, less efficient, features of 
PL/I were used. It should therefore be easy to translate 
these programs into PORIRAN should it prove desirable. 

In choosing a language for the l/o programs efficiency 
was the deciding factor. It turns out to be quite easy to 
access the data in assembly language, but very difficult 
in any higher-level language. This is because the organization 
of the data set does not conform to any of the standard 
IBM-supplied formats. A considerable savings in efficiency 
is effected by using assembly language. Fortunately the logic 
of the programs is probably not subject to alteration, so 
including parameters to modify table lengths and the like 
provides them with sufficient flexibility. These matters 
are discussed in more detail below. 

It was decided to limit segmenting of the einalysis as 
much as possible to the minimiun ' amoTint dictated by the 
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problem* This is because a large number of housekeeping 
chores are performed at the beginning and end of each PL/I 
subr«rutine making extensive use of subprograms quite costly. 
Thus the number of entries into a computational subroutine 
are limited by having each calculate a whole segment of 
data as specified by its arguments, rather than just one 
data point at a time. This produces a gain in efficiency and 
makes the programs easier to use. To some extent it also makes 
them more difficult to understand and therefore harder to 
modify. 

The manner . in which the subroutine arguments are speci- 
fied was chesen to make them easy to operate from the stand- 
point of the user. For example, the starting and ending 
points of a segment of data are specified in terms of their 
times in seconds (and decimal fractions thereof) relative to 
the beginning of the performance in the actual time frame of 
the performance. Time intervals are likewise expressed in 
seconds. Other variables specify such things as the nTomber 
of data elements per second and the number of data elements 
in a record of raw data. 

As mentioned above, the computation^ subroutines determine 
the siOTal intensity, its fundamental frequency (or pitch 
period), and its overtone spectrum. The programs are controll- 
ed by specifying the points in time at which the information 
is required. The starting time, the time interval between 
points, and the number of points required suffice to determine 
this. By computing several points of data at one entry to 
the subroutine the total number of entries is minimized. 

The programs perform the analysis by operating on the raw 
data contained in an interval straddling the data point of 
length equal to the pitch period. Knowledge of the pitch 
period is of fundamental importance. 

The techniejue for determining the pitch period is the 
mathematical analog of the following physical experiment. 
Suppose one. holds a vibrating tuning fork near the open end 
of a hollow cylinder whose other end is closed off with a 
movable piston (see Figure 2). This causes the air column 
inside the cylinder to vibrate. Moving the piston changes the 
length of the air column. As this is done, a position will 
be found at which the tone pr^iduced by the tuning fork becomes 
considerably louder. The condition of maximum loudness is 
called "Resonance”. If there is no node between the piston 
and the mouth of the tube then knowing the length of the air 
column gives the wavelength of the tone and hence its 
frequency. This may not be the fundamental frequency of the 
tuning fork because one of its overtones may be excited. 

This may be determined by searching for resonances of longer 
wavelength by lengthening the air column. 
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Figure 2. Obtaining the Wavelength from a Resonance 








Standing Wave Pattern at Resonance 
Wavelength = 4L 
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The mathematical analog of this used in this program 
performs a discrete Fourier analysis of the raw data over 
some interval about a point# The first Fourier coefficient 
is examined and the size of the interval is varied until it 
attains a maximum value# This corresponds ?n the above 
example to changing the length of the air column to obtain a 
resjnance. The lowest resonance is sought by doubling the 
length of the interval and repeating the Fourier analysis. 

If the size of the first Fourier coefficient is sufficiently 
different from zero the process is repeated. Otherwise 
the length of the previous interval in seconds is taken to 
be the pitch period and its reciprocal the fundamental 
frequency. Since the data involved is primarily music, the 
search is first made using intervals of ^ step on the temper- 
ed musical scale# Smaller intervals are then used to 
locate the maximum. The Cooley-Tukey faist Fourier transform 
algorithm is used for the Fourier analysis. 

The intensity of a signal may be easily fovind once its 
fundamental frequency is known. It is equal to the square of 
the signal integrated over one period and divided by the 
length of the period. In this instance the trapezoidal rule 
was deemed accurate enough for doing the in$:egration, so 
that the problem reduces to summing the squares of the data 
points over.’va period. The overtone spectrum is determined 
by carr5ring out the complete Fourier analysis over an inter- 
val of length equal to the pitch period. 

The l/o programs servicing the computational subroutines 
are similar to each other in function and design bvt quite 
distinct from the program which is used to access the raw 
data. The function of the former is to eliminate the necessity 
of re-computing results when a large volume must be processed 
repetitively. For example a person operating the 2250 may 
wish to scan back and forth through a large segment of data 
representing the pitch of the signal. Storing the calcula- 
tions the first time they are needed and calling them back 
subsequently results in economical utilization of the central 
processor and core storage. The program handling the raw 
data allows an entire -performance to be accessed randomly by 
converting all of the data into the computer format and storing 
it on the disk. This minimizes the nixmber of cumbersome tape 
operations required and makes it' necessary to translate the 
data only once. 

Output data from the computational subroutines are stored 
on the disk in records of fixed length, representing a fixed 
interval in time. To simplify handling the data each record 
is defined to begin at a pre-determined point in time 
relative to the beginning of the performance, regardless of 
the time at which data is requested. These times are 
determined from the record length and the time interval between 
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points I both of whioh are established by the user. Prom this 
information! given a recpiest for a data element as a time 
expressed in seoondSf the program oan uniquely deoompcse 
the request into two integers | one giving the reoord number 
relative to„ the beginning of the performanoe and the other 
the looation of the desired data element relative to the 
beginning of the reoord. If the time of the request does not 
oorrespond to one of the pre-assigned data points this prooe- 
dtire effeotively trunoates the time downward so that it does 
oorrespond to a valid data point. 

With these oonventionsi aooessing the data on the disk 
would be very easy if disk storage space oould be reserved 
for all of the data that mig^t be oomputed for a performanoe. 
The amount of spaoe recpiired for this is prohibitive! how- 
ever, so spaoe is allotted for only a fixed number of reoords, 
oorresponding to a time interval within the performanoe of 
fixed length. This length is established by the user for 
optimum effioienoy. The looation in time of the interval 
relative to the beginning of the performanoe is allowed 
to vary as requests are made for data not represented by 
the ourrent time interval. 

Aooessing the data under these oiroumstanoes is a little 
oomplioated. To understand the difficulties which arise, 
consider the example shown in Figure 3 . Here it is assumed 
that space has been allotted for n records, stored contigu- 
ously in physical locations on the disk designated by Ri , 

R2, R3!..., Rn. The data stored in each record represent 
specific points in time relative to the beginning of the 
performance. The time of eabh record is given by specifying 
the time of the first data point in each reoord, say T-| , 

Tg, Ti, •••! in chronological order with the earliest 
record of time T') stored in the first physical location R^ 
and the latest record of time T^^ stored in the last physical 
location R . If the record in storage with the earliest time 
is always to be stored in the first physical location, then 
ther is no fixed relationship between the time of a given 
record and its location in stsrage. Control data such as the 
time of the earliest record and the length of the space 
allotted can be used to decide if a given record of a particu- 
lar time is represented by the records which are currently 
stored. 

Now suppose it is desired to add two new records to the 
data set corresponding to times + 1 and Tj^ 2 • Since 

all of the available space is taken up, the time interval 
represented by the data stored must be shifted upward by 
deleting the two earliest records, T^ and Tg. If the physic- 
al location R^ is always to contain the record with the 
earliest time then all of the reoords must be shifted, so that 
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T-| is replaced by T^i ^21 by T^, and so on. Then the two 
new records cam be added to the end cf the data set 1 so that 
the physical location always contains the record with the 
latest time, which is now + 2* control data is then 

updated to reflect the fact that the earliest record in 
disk storage now corresponds to time T3. 

Although the lugic involved in the above scheme is 
simple, it is clearly quite inefficient, as the entire data 
set has to be re-written every time the time-interval which 
the data represents is moved. The approach actually taken is 
tw replace the records which are to be deleted by those 
which are added. In the foregoing example, T^ and T2 would 
be replaced by T^^ ^ and T^^ 2 respectively. The record 

corresponding to flTe earliest time stored is now not neces- 
sarily contained in the first physical record, , so that 
its physical location must be accounted for by an additional 
entry in the control data. The data set can be thought of 
as being organized in a ring, as shown in the bottom diagram 
of Figure 3 . The record contained in R^ may be considered 
as following that contained in Rj^, 

An important feature of this type of organization is 
that there is a correspondence between the time of a record 
and its physical location in the data set. This is obtained 
by computing the record number from its time, as discussed 
above, modulo the number of records allotted to the data 
set plus one; that is, modulo n + 1 in this example, hlien 
they are stored on the disk, records numbered 1 through n 
are stored sequentially in physical locations R^ through 
Rn, as are records numbered n + 1 through 2n, 2n + 1 through 
3 n, and so forth. This relationship may be calculated by 
expressing the quotient of the record number divided by n + 1 
as an integer and a remainder, KTien the remainder is zero 
the record will be stored in R^ ; if it is one, the record will 
be stored in R2, and so on. Because the data set organization 
is transparent to algorithms like this, accessing it is 
fairly efficient. 

Information about each record is stored in a table which 
contains a series of entries for each record for which 
space is reserved. Rather than accessing the data directly, 
the program computes the location of the entries correspond- 
ing to the desired record. This tells the program if the 
data record has been computed and stored and if so where to 
find it. The actual device address of the record is stored 
so that it may be retrieved quickly. The table is kept 
updated to represent the current status of each record at 
all times. 

If the requested data has not been computed the program 
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establishes the arguments required by the appropriate compu- 
tational subroutine and returns control to the calling prog- 
ram with an indication of this. The latter can then call the 
computational routine and subsequently return control to the 
l/o program which writes the new data on the disk. This 
process continues until the original recpiest for data is 
satisfied. Thus the user need not control directly the 
computational subroutines, but rather he can think in terms of 
t^e segments of data required. 

The program handling the raw data performs three basic 
functions. It provides calibration information in the form 
of the number of data elements per seoodd and the time 
interval in seconds between data points so that the exact 
time relative to the beginning of the performance of each 
data point is determined. It reads the entire performance 
off the digital tape, translates it into the computer format, 
and stores the results on the disk. Finally, it accesses the 
data off the disk at random as requested. 

Calibration information is obtained from a calibration 
record at the beginning of the tape consisting of a wave suoh 
as a sine wave of known frequency. The program coTints the 
number of pitch periods and data elements so that the 
calibration parameters can be computed from the frequency. 

A sufficiently long calibration record will provide a 
large number of oomplete pitch periods and data samples so 
that instabilities in the equipment can be averaged out. 

If the calibration reoord is recorded at the same speed 
as the performance all times will be in the actual time frame 
pf the performance, so a stop watch can be used to determine 
the times of features of interest in the original tape. 

The process of reading the tape, translating the data 
into computer format, and writing it out again on disk stor- 
age poses no problem. The translation is easily accomplished 
in assembly language. Two buffers are used and controlled 
by the program. The tape reading operation is the slowest 
of the three operations so translation and writing on the disk 
proceed concurrently with reading in of the next tape record. 
Each performance is preceded by a header reoord and the tape 
is terminated by a trailer record. The operation continues 
until one of these is encountered. Then it stops and control 
reverts to the calling program. 

Providing for random access of the data from the disk 
posed some problems because the records produced by the anaf- 
log — to-digital converter are not of uniform length. Thus 
it is not possible to compute directly the record in whioh a 
data element of a given time resides. This problem is 
circumvented by establishing a table of data with a series of 
entries for each record for which space is allotted on the disk, 
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IrJhen the program is initialized disk space is allotted 
for the maximum number of reoords of the maximum length to be 
enoountered. These values are determined by the user. The 
devioe address of eaoh record; is stored in the table at this 
time. IrJhen the performance is read in, the number of data 
elements in eaoh reoord and the starting time of each record 
is reoorded in the table as the reoords are prooessed. The 
length of eaoh reoord is determined from the maximvm record 
length and data and is used by the oontrol program for reading 
the tape, whioh is aooessible to the assembly language 
program. The calibration faotor, number of data elements in 
the previous reoord, and size of the tape reoord gap are used 
to oaloulate the starting time of eaoh reoord. lilhen all of 
the reoords of the ourrent performanoe are read in the 
(variable) total number of reoords is reoorded and an indica- 
tion is plaoed after the entry in the table oorrespoiiding 
to the last reoord read. 

I’Jhen a request for a data element of a given time is 
made the record containing that data element is looated by 
searohing this table. The searoh is not sequential, but 
prooeeis in a more efficient pattern. This is done to reduoe 
the searoh time, although it leads to more oomplicated 
logio. 



The display oontrol program is a main-line PL/I 
program whioh enables the user to obtain a graphio display 
of the raw data and the outputs of the computational sub- 
routines on the 2?50 oathode ray display oonsole. 

The oonsole is equipped with a typewriter-like key- 
board and a set of 32 push buttons. Using the keyboard 
the user can enter data into the buffer of the 2250 fromwhich 
it is displayed and made available to the program. When a 
push button is pressed it oan be sensed by the program enabl- 
ing it to respond in a pre-determined fashion. 

The display control program uses the keyboard to enable 
the operator to specify the type of display he wants, the 
time of the frame he wishes to be displayed, and the soale 
option he desires. There are three types of displays: one for 

the raw data, one showing the pitoh period and amplitude, 
and one showing the Fourier ooeffioients. The scale options 
determine the time spanned by a single frame. There are two 
options for plotting the raw data and three for the other two 
displays. 

The push buttons are used to manipulate the display and 
perform oertain control funotions. The display may be 
translated forward or backward in time by a full frame or one- 
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third of a frame* Its scale may be expanded or compressed. 
Control functions performed enable the operator to revert 
to fxmctions performed on the keyboard such as requesting 
a different type of display or a new time. There is a 
button which causes the program to begin processing the next 
performance in sequence and another which terminates, process- 
ing altogether. 

The program logic is diagrammed in Figure 1 • where it is 
divided into several blocks according to function. VIhen 
generation of the display begins the program is placed in the 
wait status until a button is pushed or the end key on the 
keyboard is depressed, signaling the end of data entry. 

The program senses exactly what has taken place and transfers 
control to the appropriate point in the program to handle 
the condition. 

The display is generated and controlled using the Graphic 
Subroutine Package supplied by IBM. These generate the 
actual graphic orders and data. It is through them that the 
program enters the wait status and senses the condition that 
caused it to leave this status. They also provide the program 
with the data entered from the keyboard. 

It is possible not to use these subroutines; subroutines 
could have been written in assembly language using the graphics 
access method especially for this application. It was 
decided not to do this because of the difficulty of such a 
.task and the problem of making future modifications of the 
program* Using the graphic subroutine package makes it much 
easier to make modifications, although there is some loss in 
efficiency. 
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App6ndix IV; SCANNER/SYNTHESIZER IMTERFACE 

An electronic solid-state interface for the EMR optical scanner 
and a voltage controlled synthesizer. Original design - Ernest 
Guignon; design modifications - Philip Bognar. 



The scanner/synthesizer interface was designed to convert the timing 
and signal pulses from the scanner to a suitable input for a Moog (or 
equivalent) voltage controlled oscillator (VCO) and/or a voltage con- 
trolled amplifier (VGA). The interface is lepresented schematically in 
Figure 1 and operationally in Figure 2. A second interface channel is 
available which is identical in function, but which utilizes the ramp 
generator output from channel one. 

The two inputs provided by the optical scanner to the interface are 
identified as: A. Reset and D. Signal Pulse. At the beginning of 

each optical scan, a reset pulse (A) is generated. After level shifting 
it is applied to the ramp generator, and resets the ramp output to zero 
volts. The ramp generator employs an operational amplifier in an inte- 
grator configuration to produce a linear, sawtooth shaped output func- 
tion, designated a "ramp" (Figure 2, line B). This waveform, which pro- 
vides a vwltage proportional to the position of the optical scan, is ap- 
plied to the input of the track and hold module and to the window gener- 
ator. 



The window generator limits the active scan region for the channel. 
These limits can be selected by the operator, and correspond to bound- 
aries on the paper fed to the scanner. This allows the paper to be di- 
vided lengthwise into two regions, each controlling a separate informa- 
tion channel. Ifhenever the ramp generator output voltage is within 
the preset limits, the output of the window generator (a dual compara- 
tor) is at a logical "1" voltage level. During the remainder of the 
sweep, the window is "closed" at a logical "0" level (line C). During 
normal operation the channel 2 window will be a logical complement to 
channel 1 (channel 1 is open when channel 2 is closed, and the converse). 

A signal pulse from the optical scanner (line D) indicates that a 
mark appears on the paper at the concurrent scan position. The signal 
pulse is presented as input to a logical" and-module," along with the out- 
put of the window generator. The mono-pulser will be triggered only 
when the signal pulse occurs within the window limits set for the chan- 
nel. In Figure 2, signal pulses 1 and 2 (line D) occur within the wind- 
ow, so that sample command pulses are generated (line E). Signal pulse 
3 occurs outside the window, so that no sample command is generated. 

A sample command pulse from the mono-pulser causes the track and hold 
module to sample the concurrent value from the ramp generator, and to 
transfer it to the output amplifier (line F). The output voltage is a 
linear function of the position of the line sensed by the optical scan- 
ner. The output amplifier provides adjustable output gain and zero off- 
set voltage, and protects the track and hold module from a short circuit 
condition at the output, --c 



The squelch circuit produces an "off* condition when no sample com- 
mand has appeared for two or more consecutive optical scans (line G), 
as indicated by the reset pulse. The two level output of the squelch 
provides an on-off input to a voltage controlled amplifier. 
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Appendix V: 



HARMONIC SYNTHESIZER 
J, Ricci 



The harmonic synthesizer was originally conceived as a flexible in- 
strument for on-line realization of complex musical events » and prom- 
ises to be an extremely useful tool for experiments in auditory percep- 
tion. VJhat is desired is a device which will permit real-time syn- 
thesis of functions which are "small-scale periodic" - that is, fimct- 
ions whose Fourier component amplitudes are essentially invariant over 
an appreciable number of periods of the fundamental, although they may 
be varying dynamically with time. At any instant of time we would like 
to be able to specify the amplitudes of the first N harmonics of such a 
waveform by means of knob settings or, preferably, by means of control 
voltages. Pictorially, we would have the input-output relationship 
shown in Fig. 1, where input to the device are the control voltages: 



Vf (t .) : Proportional to fundamental frequency at time t . 



v-| (t),V2(t)... Vjj(t): 



Proportional to the amplitude at time t of the 
N harmonics . 



And the device outputs an auditory waveform whose Fourier representation 
is: N 



“ ‘'a ^ 



COS ji k 




i=1 



VJhere k_v (t) is the radian frequency of the 
fundamental around time t. 



An extra feature which we would certainly accept if we could get it 
vrould be some control over the phase relationships among the harmonics 
(Fig. 2). At t=0 we would initialize parameters through so that 
the output function becomes; 

VQ(t) = k^ y [’v^(t)] cos^p. k^ t - 0^]; 

i=1 

An early attempt at a design which has subsequently been abandoned 
involved extracting the harmonics of a spike train by means of voltage 
controlled bandpass filters and then scaling them individually using 
voltage controlled amplifiers. Useful as a conceptual tool, the ap- 
proach would have been impractical for a number of reasons, to wit: 



1. The VCF’s would have been exhorbitantly expensive to con- 
struct if, in fact, it proved possible to construct them at 
all. 



2. Each such filter would require its own control voltage 
merely to position its center frequency at a given instant 
precisely where the desired harmonic is to be found. 

3. Each VCF wovild introduce a phase shift and a considerable 
temperature-stability problem. 
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Accordingly, some other approach is required. Suppose we had a se- 
quential machine which would output discrete binary samples of an arch- 
etypal sinusoid (Fig, 3)« Input, X(n) , to this machine is a train of 
clock pulses, and output consists of the binary samples: 

Y(n) = [oos ' 

Where Tn is the number of sample points in one period. If this is then 
input to a d/A converter having reference voltage 6j.gf , output becomes: 



e.(n) = e 



ref 



cos 



m 



Now, say we want this voltage to have frequency f. We pulse the machine 
at the rate n/t = fTj^ pulses/sec, and the output samples as a function 
of time become: 

eo(t) = e^.g^ cos -^,2 IT f.; s(t)j 



where s(t) = 



f T, 



for 



n 



f T. 



n 



<t < 



i+1 
f T. 



n 



and i = 0,1,2, 



If this output is then low pass filtered at less than half the pulse rate, 
we have by the sampling theorem: 



eQ(t) = ej.gf cos (2 iV ft) 



Before considering further the difficulties posed by this kind of 
approach, I should like to make note of a few inherent advantages 
which make the attempt seem well worth pursuing; 



1. The reference voltage, ej.gf » need not be constant. In fact, 
the d/a converter is effecting a multiplication between ej.gf and the 
sinusoidal samples. Thus, with a converter of the proper design, we 
should be able to replace e^-gf by one of the ’''■!( t) of Fig. 2 and 
thus obviate the need for voltage controlled amplifiers for amplitude 
scaling. 



2. If we can synthesize each harmonic with one of these sections 
such that each section can be pulsed from the same voltage controlled 
clock, we eliminate all the control voltages which were required to 
center the VCF resonant frequencies. 

3. There need be no phase difference among the harmonics unless we 
want them; and if special phase relationships are desired, we can spec- 
ify them at will by advancing each individual section an arbitrary num- 
ber of steps. 



The form of synthesizer we are approaching is shown in Fig, 4» To 
make this a bit clearer, let us use some realistic numbers. Suppose we 
want a range of fundamental frequencies from 100 through 3000 HZ. , and 
N=20 harmonics. Frequency response of the instrument is to be 20 KHZ., 
thus we will low pass filter at that frequency. To find the number of 
sample points, Tn , per cycle of the fundamental, we note that the sam- 
ple frequency at worst-case condition is lOOTn . The sampling theorem 
requires lOOTn^ 2(20KHZ.) which means we can use Tn =400 samples/cycle 
and filter a shade below 20 KHZ, The machine for fl has one input 
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and (say) ten outputs (for 10 bit resolution)} and it requires 400 
states. Clock pulse rate is 400 (100)=40ICHZ, For f-| =100HZ, And 
it becomes 400 (3000)=1.2MHZ, For f-] =3000HZ, ; well within the ca- 
pabilities of modern logic components. 

However! the device as postulated has one serious drawback - each 
of the 20 sequential machines has 400 states. The f2 machine uses 
its 400 states to store samples of two cycles, the f^ machine stores 
three cycles, etc. In order to mal<e the device more economical, we 
would like to have each machine store only one cycle , resulting in 
the number of states getting small for the higher frequency machines. 

In fact, if we can manage this kind of trick, the number of states 
for Mi will shrink quite rapidly as i increases. Using the typical 
numbers above, consider that Ml stores one cycle consisting of 400 i 
samples. Let M2 store one cycle of 200 samples (200 state machine). 

If we pulse both these machines at the same rate we will get two 
cycles out of M2 in the time it takes to get one cycle from Ml, which 
is precisely what we want. Similarly, M4 has 100 samples, MS has 50, 

Ml 6 has 25, etc. That is, the number of states for machine Mi is 
400/i, Naturally, the difficulty arises for those values of i which 
do not divide evenly into 400, In fact, if we wanted to design the 
device such that each harmonic were exactly generated by this scheme, 
the number of states of Ml would have to be the product of all the 
prime numbers in the interval (l,20), and the pulse rates required 
would be unthinkable, 

A solution to this dilemma currently under investigation is to ap- 
proximate desired behavior in the following way. Machine Mi stores the 
integer part of (400/i) samples. Clearly if we stopped here we would 
be in serious trouble, since inexact harmonics would be slightly higher 
in frequency than they should be, leading to successive interference 
effects from those pseudo-harmonics. However, the machines can be de- 
signed to inhibit the correct number of clock pulses so that at the 
start of each cycle of the fundamental all harmonics are in proper re- 
lation to each other. To make this clearer, let’s consider M3 , We 
store int( 400/3) =133 samples in this machine. At the end of one cycle 
of the fundamental, the fo waveform would be I/400 ahead of where it 
should be. If, however, the 201st pulse is inhibited, then after 200 
pulses f^ will be (1/8OO) ahead, after 201 pulses it will be (l/SOO) 
behind, and after 400 pulses it will be coincident with the ideal f^ , 

It is hoped that the amount of harmonic distortion introduced by such 
approximations will be negligible. If not negligible for a psychoa- 
coustical laboratory instrument (in which case we can resort to equal- 
length machines), it might still be more than adequate for music and/ 
or speech. Further, if necessary, the sampling theorem might be con- 
foundable to some extent by low pass filtering the output mix at dif- 
ferent frequencies as a function of f^ . For example, for f} <".500 HZ, 

We have f20< 10,000 HZ, And if we filter at 10,000 HZ, VJe only re- 
quire 200 s ample s/cycle. For 500<f}<3000, filtering at 20 KHZ, we 
would have a worst-case requirement of 500Tn> 40,000 or Tn> 80, Thus 
with such a scheme, 200 samples would be adequate. 

To reduce the number of states of the component machines, it may be 
possible to filter the outputs of the individual stages individually at 
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different cutoff frequencies, before mixing. Thus the number of states 
required by the sampling theorem may thereby be reduced. Of course, 
for an instrument of laboratory quality, much attention will have to be 
paid to the phase effects of such an approach. 

Fig. 5 presents a set of waveforms generated directly via D/A con- 
version by computer simulation. For this case we have l6o samples for 
fl and are generating 11 harmonics. The "break*' (pulse inhibition) 
positions for this set of photographs are arbitrary. In Fig. 6 we see 
the effects of the approximations in the filtered waveforms. Of course 
fB is a perfect sinusoid while f »7 and f 9 are not. The perceptual 
correlates of these distortions are under investigation. In particular, 
it will be interesting to discover whether apparent smoothness of a 
waveform effects perceived audio distortion. That is to say (Fig. 7) 
breaks which occur at points of zero derivative result in a waveform 
which appears to the naked eye to be a pure sinusoid, while breaks else- 
where are readily visible. In terms of Fourier spectrum the harmonic 
distortion is the same in both cases, but the visual difference is 
striking enough to make one wonder. In any event, we can apparently 
reduce the harmonic distortion as much as we are willing to pay for, 
and it seems that the attainment of an instrument of laboratory quality 
v.’ould be a relatively straightforward matter if adequate funding were 
available . 
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Figure 6. 

Data 3ar.i]>lesy 
Eel’oro and After 
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Figure 7, 

Distortion at Breaks 
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